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To meet several pre-e valuation goals of the national 
Emergency school Aid Act |ESAA) Evaluation study, the following 
activities that nere undertaken are described; selection of an 
achievement measure; a pretest of this measure to assess student 
needs in schools eligible to receive funds under the Act (schools 
uith a minority enrollment of 50 percent or more) ; an important 
research effort directed toward possible ethnic and/or cultural bias 
in the measures; and the establishment of test norms to aid in the 
interpretation of student and school performance relative to the 
appropriate sub-population. California Achievement Test (CAT-70) 
levels 2 and 3 subtests measuring Beading Comprehension, Mathematics 
Computations, and Mathematics concepts along with a questionnaire 
describing student background were administered to a nationally 
representative sample of BSAA-eligible students. A descriptive 
analysis of the resulting data is provided and the research conducted 
to investigate possible bias in the selected measures is documented. 
Appropriate measures of the debiased scales and the implications of 
their usa are discussed. Finally, methodology employed for scaling 
the tests for use in interpreting performance within the 
BSAA-eligible population is presented. <BC) 
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OVERVIEW OF THE EMERGENCY SCHOOL HID ACT <£SAA) NATIONAL EVALUATION 

The Emergency School Aid Act (ESAA) was enacted into law in June of 1972 to 
provide elementary and secondary school districts with financial assistance 
to: (1) meet the special needs incident to the elimination of minority group 
segregation and discrimination^ i2) encourage the voluntary reductiont elim-* 
ination^ or prevention of minority group isolation^ and <3} aid children in 
overcoming the educational disadvantages of minority grov^ isolation [P^L* 92--318t 
Sec* 702(b}]. IVhile the Act as amended in 1974 (P«L» 93-380 Sec» 641) authorizes 
the appropriation of one billion dollars for fiscal year 1973 and a similar €unount 
for the period ent?ing June 30 # 197^, actual appropriations have amounted to 270 
million and 23^! million dollars for fiscal years 1973 and 1974 respectively, with 
the fiscal year 1975 appropriation pending « Since funds are annually appropriated 
for obligation and expenditure during the fiscal year succeeding the year of 
appropriation* the major thrust of the Act began during school year 1973'»74 and 
is expected to continue through school year 1976-^77 • 

Seventy-^four percent of the Act*s annual appropriation is reserved for two siiib^* 
programs f the Basic Grants (59%) and the Pilot Programs (15%) « The Basic Grant 
program is essentially a desegregation program designed to reduce minority group 
isolation, meet the needs incident to the elimination of segregation and dis*- 
crimination, and to aid school children in overcoming the educational dis«- 
advantages of minority group isolation* In contrast, the Pilot program is a 
compensatory education program designed to improve the academic achievement of 
children in minority isolated schools (i.e., schools with over 50% minority 
enrollment) « 

The sums annually appropriated pursuant to the Act are apportioned to States on 
the basis of the ratio of their number of minority group school aged children 
to the number of such children in all States. Local school districts compete 
for the fxmds apportioned to their State through grant applications to their 

vi 



HEW Regional Office* In applying for an £SAA grant a local school district must 
domonstrato that it has needs related to the Act^s objectives and that it has 
designed a program based upon authorized activities that shows promis^^ in 
achieving one or more of the Act^s objectives « 

Kvaluation Objectives 

The Act aut!\oriKes a national evaluation of its progra^ns which is supported 
by an annual one-percent reservation of appropriated ESAA funds. As designed 
by the U*S* Office of Education (USOE) and conducted by the System Development 
Corporation (SDC) # the national evaluation focuses on an integrated evaluation 
of the £SAA Basic and Pilot programs and has the following general objectivesi 

• Determination of the short and long term national invpact of the 
program in terms of the Act*s objectives* namely * reduction of 
minority group isolation # elimination of discrimination # and 
improvement of basic skills in elementary and secondary schools « 

• Identification and description of the needs of students in or 
from minority isolated schools; the characteristics of local 
programs, including their resource allocation's relationship 

to needs I and the interrelationships of those factors with program 
impacts 

• Documentation and dissemination of information relating to unusually 
successful local programs and program conqponents that ^pear to be 
related to success « 

• Determination of the relative effectiveness of three forms of educa- 
tional intervention—desegregation, compensatory education, and their 
combination— as compared to no special intervention in minority isolated 
schools « 

• Investigation of the relationships among regular school expenditures, 
st^piomentary ESAA expenditures, and program impact in an attempt to 
determine local program cost/effectiveness and the minimum supj^lemental 
expenditures necessary to ensure some measure of program success ♦ 
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In an atteinpt to achieve thoso objoctives, data arc being collected from a 
nationally representative 8ait5>le of ESAA^fundod school districts ovor a poriod 
of two to three school years« 

Ejgajjg Methods and Proce 

Data on achievementv school climate and discrimination » and reduction in minority 
group isolationt have been collected annually since school year 1373^74 in a 
nationally representative sample of approximately 75 Basic and 42 Pilot elementary 
schools and 54 Basic secondary schools in S5 £SAA*funded school districts « 
Within each school in the evaluation^ samples of approximately 60 students in 
each of grades 3f 4t and 5 or 10 ^ 11 # and 12 were randomly selected across sections 
within grade to participate in the evaluation* students are followed longitudinally 
through those grade bands, with grade 5 and 12 students leaving the sair^plo each 
year« In any one year there are approximately 27,000 students, 4,000 teachers, 
172 principals, and 85 local £SAA coordinators # district business managers, and 
superintendents in tl^e evaluation sample* 

The selection procedures for schools within districts consisted of classifying 
all ESAA-eligible schools in terms of estimated i^rior student achievement, 
estimated socio-*economic status of enrolled students, and percent and type of 
minority composition. Pairs of matched schools {schools similar in the above 
dimensions) were l^en randomly selected and within each pair, one school was 
randoXiily assigned to the treatment {ESAA funding) condition and the other school 
to the control (no ESAA funding) condition. These procedures resulted in a true 
exp'^rimental design with convparablo treatment and control schools in each district. 

At the beginning and end of each school year, mathematics and reading achievement 
tests and questionnaires are administered to all students in the evaluation sample. 
Monthr.y, a student attendance and exposure log is completed for or by cac}4 
student in the sample to obtain data on the types of activities students arc 
exposed to, and the frequency and duration of exposure to each activity. Near 
the end of each school year, a battery of questionnaires is administered to 
superintendents, district business managers, local ESAA coordinators, principals, 
teachers, and students in the sample. Those questionnaires provide data on district, 
school, and classroom minority group isolation, program operation, resource 
allocation, and student and staff background characteristics. 
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Data analysis will £ocus on tho major objoctives of the stu^ through use of 
instruments tailored to measure the Act*s three major i^urposes* Annual analysis 
will ittclude con^arison of pre*i*ost change in outcome measures among treatment 
and control schools^ comparison of relative effectiveness of different inter- 
vention approaches^ identification of unusually successful local projects t and 
determination of the relationships between program characteristics and program 
impact* In addition to annual analysest cumulative iit^act of the program will 
be determined on the basis of the longitudinal data collected from the sample* 
Finally^ cost-effectiveness of the program at the local and national level will 
be determined annually* 

l^>proximately seven months after post-test data collection each year^ system 
Development Corporation in conjuixtion with USOB will produce evaluation reports 
summarizing ESAA impact* Bach report succeeding the fir^t will address the 
subjects of cumulative impact and comparative impact after successive years of 
program ii^plementation* 

Evaluation Design F e ^^ 

The ESAA evaluation design has a con^ination of features that make it an advance 
in the state-of-'the-art in national evaluation* Previous national evaluations 
have included one or more of the design features of the ESAA study, but no other 
study to date has integrated all of the following highly recommended evaluation 
procedures s a sample representative of the population affected by the program; 
annual pre-post data collection on intact measures; longitudinal data collection; 
randomly selected schools and random assignment of treatment and control conditions; 
throe measures of impact directly related to the program* s national objectives; 
an achievement test restandardisiation that resulted in a supplementary set of 
norms for minority isolated schools which will be used in conjunction with 
existing national norms; use of an achievement test specially modified to reduce 
its possible bias against minority students; and finally, a combination of 
classical and Bayesian data analyses techniques. It is expected that the partic- 
ular combination of design features that constitute the national evaluation of 
ESAA will result in less ambiguous results than previous national evaluations and 
a firmer basis upon which Congress and Administration can judge the itltimate effec- 
tiveness of the Act. 
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EMERGENCY SCHOOL AIP ACT ACHIEVEMENT TEST RESTANDAW3I2AT10N 

EXECUTIVE SUMMARY 



Respected meitO^ers of the education, test development, evaliaation, and minority 
communities have at various times charged that existing standardized achievement 
tests are inappropriate for the assessment of minority student academic per** 
formance* In general, this charge is based upon the fact that minority group 
students are often under-represented during two important phases of the test 
development process i namely, test item selection and test standardization* As 
a consequence of such minority group under-representation, standardized achieve- 
ment test items are said to be biased against minority students and test norms 
are said to be inappropriate for minority students and for schools with high 
minority student enrolln\ants» In short, these critics claim that existing 
standardized tests are developed by, with* and for white middle-class 
America « 

A counter argument to the bias criticism, made by an apparently equal number of 
qualified individuals, is that even grantiiiy that minority groups are under- 
represented in most test development efforts, standardized achievement tests 
have an important fuiiction in school systems regardless of their minority con- 
centration. Such tests provide a standard, albeit a middle-class white 
American one, by which students and schools across the nation can be compared to 
each other. According to tnis argument, achieveinent tests are a valid criterion 
for assessing the ability of all students to achieve in our society. 

Recognizing the apparent validity of both arguments, and realizing that debate 
has yet to resolve the issue, it was decided early in the planning stages of the 
Emergency School Aid Act (ESAA) national evaluation to develop an achievement 
test that would satisfy the interests of both camps. The primary objective of 
the activity was to select the best existing reading and mathematics achievement 
test battery currently available for evaluation of the ESAA program and then to 
improve the sensitivity, reliability, and validity of *the battery for the 
evaluation population — students in or from minority-isolated schools*^ 



The Emergency School Aid Act (ESAA) defines minority-isolated schools as 
schools with a minority enrollment of 50% or more. 



X 13 



The major product of the activity would be a x^standardized achievement test 
with (a) norms for students in the nation's schools in general ♦ (b) supplementary 
norms for minority-isolated schools and students in such schools, and (c) two 
scorin9 systems, the original scoring system and one which would be less biased 
against minority students* If that product could be achieved, then it would be 
possible to assess the impact of the ESAA progx'am using both scoring systems 
refereiicod to both sets of norms. That is, a student's score or school's mean 
score, oriqinal and debiased, could be compared to the norm for schools and 
students in general and to supplementary norms for minority-isolated schools and 
students exirolled in such schools. 

The restandardization process consisted of several steps, the most important of 
which were test selection, test administration within a nationally representative 
sample of minority-isolated schools, identification of items biased against 
minority students and removal of such items from one of the test scoring 
systems, and development of a set of supplementary norms for minority-isolated 
schools and children in such schools. This technical report, the first major 
product of the ESAA evaluation, discusses the procedures employed during re- 
standardizatlon and the results of the effort. The following paragraphs will 
briefly summarize those activities and discuss the limitations and potential 
usefulness of the product. 

The test selection phase of restandardization consisted of a review of all 
existing standardized achievement tests and the selection of a pool of reading 
and mathematics subtests that appeared to be most appropriate for the ESAA 
evaluation. Criteria used in the initial screening process included the 
following: test appropriateness in terms of minority group representation in 
the item selection and standardization phases of test development? extent of 
apparent minority-group bias; relevance, interest, and meaningfulness to 
minority-group students; grade level and content relevance; administration time 
burden; and reliability, validity, and normed technical excellence. The pool 
of subtests remaining after initial screening on the basis of the selection 
criteria was then reviewed by an independent panel of test development experts. 
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This panel, in conjunction with the USOE and the evaluation contractor. System 
Development Corporation, finally selected the California Achievement Test, 
1970 edition (CAT*- 70) , Levels 2 and 3* The reading comprehension, vocabulary, 
mathematics concepts, and computations sx^tests of the battery were selected for 
restandardization* It should be noted that of the tests reviewed, none met any 
one or all of the selection criteria fully, and that practical considerations 
resulted in the selection of the CAT over a few other tests that ranked as well 
as the CAT. 

Although the CAT was considered among the best tests on the basis of the 
selection criteria, it, like all other existing standardised tests # suffered 
from the fact that it was developed with, and standardized on, a sample of 
students significantly differi^nt from the ESAA evaluation sample. The CAT, like 
most other standardized tests # was designed for national usej therefore, minority 
groups were represented in its item selection and standardization sample in 
approximately the same proportion as the proportion of minority children in the 
nation *s schools. The ESAA evaluatio.i sample, however, was expected to be composed 
of well over 50% minority group students. Consequently, it was necessary to re- 
standardize the selected test battery on a nationally representative sample of 
students and schools similar to those that would eventually be selected for the 
EST^ evaluation sample, i^e,, minority-isolated schools and students enrolled in 
such schools. 

The selected CAT subtests were administered to a random sample of 30 students in 
each of grades 3, 4, and 5 in a nationally representative sample of 100 minority* 
isolated schools, near the end of the 1972-73 school year. Approximately 9,000 
students were tested. Data so collected served as the basis for the identifica- 
tion of biased items a^id the development of achievement norms for minority- 
isolated schools and students in those schools. 

Administration of the CAT prior to ESAA program implementation provided information 
for a national needs assessment of children in minority-isolated schools. Those 
data indicated that students — minority and majority group members — in minority- 
isolated schools have a significant need for remedial reading and mathematics 
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programs* The mean achievemerit in reading and mathematics of students in 
minority-isolated schools waveri^d about the 20th percentile relative to existing 
national norms. Approximately 80% of elementary students in the nation achieve 
at a higher level than the students in the res tandar dilation sample • That fact 
provides a clear and unambiguous indication of the national need for programs 
such as ESAAi which are targeted at improving the basic skills of children in 
or from minority- isolated schools. 

The restandardization data were subjected to statistical analysis in an attempt 
to identify test items that might be biased against minority students • A test 
item was considered biased if it did not measure what it purported to measure 
for both minority and majority group students* After statistical analysis, 
suspicious items were reviewed for their content by a special panel of minority-- 
group experts from the fields of test development and education. The parol was 
requested to review all items that statistical analysis suggested might be biased 
and to reach a consensus on those items that panel members judged to be truly 
biased against minority students* The items so identified were then removed to 
form a special, supplementary, less biased test scoring system. 

Data collected during the restandardization was also siabjected to standard 
statistical analyses and scaling procedures which resulted in the development 
of achievement noms for minority-isolated schools and students enrolled in such 
schools* Those norms will be used in conjunction with both scoring systems and 
with existing national norms to determine the local and national impact of the 
ESAA program during school years 1974, 1975, and perhaps 1976. 

In evaluating the general significance and usefulness of the data reported 
herein, the reader should be aware of several salient limitations imposed on the 
restandardization by time constraints and the overall ESAA evaluation design. It 
should be noted that the restandardized test was designed to assess only the 
reading and mathematics achievement of students in or from minority-isolated 
schools at grades 3, 4, avd 5, The restandardized test is therefore inappropriate 
for other student populations, grades, or subject matter areas. Further, since 
an existing test was debiased and restandardized, rather than an entirely new 
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test being developed^ the items in the test may not be the best possible items 
for use vith the ESAA subpopulation^ Even the items left in the debiased test 
were selected by the original test developer on the basis of trycut-testing on a 
sample of students in which minority groups were under-represented • Nevertheless 
those items included in the test that were later identified as being biased 
against minority students were eliminated from the restandardized test scoring 
system. If practical constraints hc.d not limited this effort, an entirely new 
test would have been developed and item selection could have been based upon item 
testing with a group of students more representative of the user population, i^e. 
students in or from minority-isolated schools. The impact of that constraint 
cannot be fully known unless such an ideal test construction effort is under- 
taken nd BcoxeB on the restandardized CAT are compared to those of ^e new 
test« 

In sum, as part of the national evaluation of the ESAA program an existing 
standardized achievement test was restandardized and debiased on the basis of 
data collected from students enrolled in a nationally representative sample of 
minority-isolated schools. Although there were practical constraints on the 
completion of the re standardization, the results as described in this report 
will be considered an advance in test development. Judgments as to the extent 
of the advance and the general usefulness of the restandardization are left to 
the reader. Nevertheless, it is encouraging to note that on July 5, 1974, the 
National Association for the Advancement of Colored People {NAACP) at its 65th 
Annual Convention in New Orleans passed a resolution demanding a moratorium on 
administration of any standardized test unless steps were taken to (a) include 
a representative number of minority students in the standardization sample, and 
<b) correct the test for biases against minority -group children • Such steps have 
been taken in the restandardization described in the following pages. 

Michael J. Wargo, Ph»D. 

U.S. Office of Education 

ESAA Evaluation Program Officer 
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I, INTROpyCTIQN 

In response to the United States Office of Education (USOE) specifications for 
the Emergency School Aid Act <ESAA) Pilot Program Evaluation, restandardization 
testing activities were initiated in spring 1973 • These activities were di- 
rected toward the selection of an achievement test to be used in the national 
evaluation, a pre-*evaiuation administration of this instriment to a sample of 
ESAA-eligible students in minority-^isolated schools, and the analysis of data 
obtained* 

The restandardization testing was designed to meet several goals* The first 
of these was to assess the academic needs of students in ESAA-eligible minority- 
isolated schools, prior to program implementation. The funding of ESAA Pilot 
programs was predicated on the belief that minority isolation adversely affects 
vStudent achievement. The restandardization testing was intended to assess the 
impact of minority isolation on reading and mathematics achievement. Addition- 
ally, the needs assessment would establish national baseline achievement data 
for students in ESAA^-eligible minority-isolated schools. These data could then 
provide a basis for studying changes in achievement patterns after program 
implementation • 

The second goal of the restandardization testing was to evaluate the adequacy 
of the achievement measures for purposes of the evaluation* Standardized achieve- 
ment measures have often been accused of bias against minorities, since these 
measures are typically developed for the majority population* To investigate 
the characteristics of the instrument when used in a high-minority-enrollment 
subpopulation, a research effort directed at the issue of item bias was initiated* 
The purpose of this investigation was to determine whether there was evidence of 
bias in the measure • If such bias was found in a subset of the items, the 
biased items would not be used in scoring the measure for many purposes of the 
evaluation* The resulting scales would be more appropriate and sensitive measures 
of program impact. 
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A tiiird yoalt closely linked to the first two, was to provide a sot of norms 
for expressing the scoros of students in i-SAA-eligible ininority-isolated schools 
aiid for expressing school means • These norms would enable us to relate a 
student *s or school's achievement level to that of other students or schools 
with similar characteristics • Such comparative norms would give a more appro*- 
priate baseline for noting the relative positions of the treatment and control 
schools in the evaluation^ and would provide an interpretive scale for use with 
the debiased measures « 

This document discusses the collection and analysis of the restandardization 
data« Sections II and III describe the selection of the achievement instruments 
used and the selection of the restandardization sample. Sections IV# and VI 
address the goals of the restandardi zation testing. Section IV provides a 
descriptive analysis of the data resulting from the administration of the achieve- 
ment test and the attendant student background questionnaire • This analysis 
describes the sanplo actually obtained for testing and doctiments th^ assessed 
achievement levels, thereby 5:>rovidinv3 both the needs assessment and the baseline 
data. 

Section V documents the research conducted to investifjate the existence of 
possible bias in the measures selected. The rationale and re.*5u3 tiavj method- 
ology for dotcctiny item bias are described* iJmpirical data relevant to the 
dotoction ot biased items arc provided, as v;cll as the comments related to the 
tactual items identified as biaiiod, The ani>ropriate unoi; of tho dt.^biCiGcd scales, 
as well as the imi^lications of their use, are discussed, 

Fin»illy, Section VI presents the methodology cn\i>ioycd for ccalinq th^^ tosts for 
use in intcri:>reting performance within the ES7U\-eli^jiblc minority^iGwlated sub- 
population. Score- to-percentile-rank conversion tables are provided. 
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M^HIEygMOBtiT^ SELECTION 

The first stiep in the restandardi^ation testing activity was the soiection of 
the achievement measures to he used* The selection of the instruments was sub- 
contracted to UCLA*s Center for the Study of Evaluation (CSE) , because of their 
extensive experience in the field of test evaluation, and was guided by criteria 
specified by-USOE* In order to implement a workable schedule for test selection^ 
examinee selection^ and test administration within the limited time and resources 
available between contract award and the end of the 1972-73 school year, initial 
efforts were concentrated on selecting measures appropriate to the elementary 
^rade levels targeted for the evaluation. Secondary- level measures were selected 
later. The following pages describe the process by which the measures were 
selected. 

The selection process began with a careful review of the criteria set forth for 
instrument selection. The following USOE criteria were to be used for test 
selection: 

• The selected achievement test battery should cover grades two 
through twelve (2-12).* 

• The selected battery must have reading (comprehension and vocabulary) 
and mathematics (concepts and computations) subtests for all grade 
levels. 

• The subtests should be independent of specific curricula; i.e., they 
should be basic skill tests. 

• The svibtest levels should have some grade-level overlap to minimize 
the possibility of floor and ceiling effects. 

• Total administration time for the combined reading and mathematics 
tests should be approximately two (2) hours or less. 



*Only elementary-level tests were considered in the initial CSE ratings • 
Selection of secondary-level tests was a separate task. 
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• The selected subtests should have minimum ethnic»group bias* 

• Minority groups should be represented in the restandardization 
sample « 

• The selected subtests should have acceptable test reliability and 
validity • 

• Norms and scales of the battery should be adequate for the proposed 
study # 

• Subtest content should be relevant, interesting, and meaningful to 
today's minority-group students • 

• The subtests should be easily administered, scored, and processed* 

A multi-stage strategy was adopted for selecting the elementary-level achieve- 
ment tests* First, a master list of potentially applicable tests was compiled* 
This list was obtained primarily from CSS Elementary School Test Evaluations 
<Uoepfner, Strickland, Stangel, Jansen, Patalino, 1970) , one of the products of 
CSE's Evaluation Technologies Program* To produce this book, CSE had amassed 
a file of all published tests appropriate for the elementary school level. After 
ruaching agreement on criteria for judging the quality of tests, CSE had rated 
ovwi 2,500 instrumants on these criteria and had published the results in the 
book* As £>art oi the rating procedure, tests had been placed into categories 
corrosi>ondinj to 145 goals of elementary education. The goals were intended to 
rcprosont ail posf^iblo student outcome goals at the elementary school level* 
Suvoral youl areas closely match the outcomo dimensions relevant to the ESAA 
evaluations. The test list was assembled from the following CSE goal areas: 

CSK G oal EShh Outcome Dimensions 

Arithmetic Concepts Mathematics Concepts 

Arithmetic Operations Mathematics Operations 

Operations with Integers 

Reading Comprehension Reading Comprehension 

Understanding Ideational Complexes 

Inference Making from Reading 
Selections 

Recognition of Word Meanings Reading Vocabulary 
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In addition to the tests contained in the CSE goal areas t several of the 
instruments that vere specifically designated by USOK but did not fall into these 
goal areas vere added to the list« The total list consisted of 66 separate 
test batteries. 

The initial list of 66 tests was then reduced by applying several absolute 
cutting criteria* If a test did not i:neet one of these criteria, it was 
iiniaediately eliminated from further consideration by CSE* In order to stay in 
the list of contenders « a test had to: 

• be designed for group administration # 

• have alternate forms* 

• be amenable to machine scoring {e*g*, oi^tical scanning), 

• have percentile or grade equivalent norms # and 

• be a measure of achievement rather than of intelligence. 

The application of these criteria Iv^l to a reduction of the initial list from 
66 tests to the following 13 instruments: 

• Bobbs-Merrill Arithmetic Achievement Series 

• California Achievement Test 

• Coiiprehensive Test of Basic Skills 

• Contemporary Mathematics Test 

• Iowa Test of Basic Skills 

• Stanford Achievement Test 

• Wisconsin Conten^porary Test of Elementary Mathematics 

• Gray-Votaw*-*Rogers General Achievement Test 

• Burnett Reading Series 

• Sequential Tests of Educational Progress 

• Nelson Reading Test 
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• Metropolitan Achievement Test 

• Science Research Associates Achieveirient Series 

£ach of the 13 remaining tests was then rated by CSE staff members on si» 
selection criteria representing important dimensions of test desirability* 
These weres 

• content/construct validity 

• examinee appropriateness 

• alternate-^forms reliability 

• curricular representativeness of test 

• distributional characteristics <as projected for ESAA sainple) 

• degree of freedom from ethnic bias 

After conqpleting its ratings of the tests on the individual criteria, CSE 
nominated two mathematics tests (California Achievement Test and Sequential 
Test of Educational Progress) and three reading tests (California Achievement 
Test, Coinprehensive Test of Basic Skills, and Stanford Achievement Test) as 
best meeting the needs of the ESAA Pilot Program Evaluation at the elementary 
school level. A packet of materials was then sent to each member of the Test 
Selection Panel* j this packet included the original list of 66 candidates, the 
cutting criteria, the relative rating criteria, and the tentative nominations. 
On March 17, 1973, a meeting of the Panel was held at SDC in Santa Monica. 
Participants included Ralph Tyler, Robert Hess, and Charles Thomas of the Te3t 
Selection Panel, along with representatives of SDC and CSE. At this meeting, 
CSE staff members reviewed the entire rating procedure with the panelists and 
discussed reasons for certain ratings given to several of the final contenders. 
The Test Selection Panel concluded the meeting with a recommendation that the 
final selections be made from the five finalists on the basis of practical 
considerations such as ease and speed of obtaining the necessary quantity of 
instruments. (Approximately 27,000 instruments were needed.) This rc^ commended 

*The Test Selection Panel is a group of consultants retained by SnC for purj^osos 
of screening the achievement instruments, llach of the panel meml:>orii wais t^v- 
lected because of his competency and exi^erience in test theory and his aware- 
ness of the problems associated with testing in large-scale evaluations. 
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couriio of action was adopted, and led to tho selc^ction of tho California 
Achievcjniint Tests in both reading and inathor»atics* The instriunents finally 
soiectou wore the CAT Level 2 and Level 3 subtests measuring Reading Compre** 
hension, Reading Vocabulary, Mathematics Con^^utations, and Mathematics Concepts* 

B*2cause thert> was considerable concern for the amount of time that test 
administration would take and because a student background questionnaire was 
to be administered in the same session, it was considered desirable to shorten 
the test battery » This shortening was effected by deleting certain subsections 
of the mathematics subtests at both levels. Specifically, the problems sub- 
sections wore removed from both levels of the mathematics concepts subtests, 
and the subsection measuring computations involving fractions was removed from 
the contputation subtest at Level 3, The removal of these subtests shortened 
the achievement battery sufficiently to allow administration within a single 
morning or afternoon session^ It was felt that the remaining items in each 
subtest constituted a p\a:er measure of the desired dependent variable « 



III. SAT^IPLE SELECTION AND TEST ADMINISTRJ^TION 



The r^standar dilation tostingt which took place in May and June 1973t Involved 
Sislecting a nai:ionally repr^s^i^tative sample of students in grades 3, 4, and 5 
in schools with tncre than 50% minority enrollment, i*e*, luinority'-isolated 
schools • Approximately 9,000 students in 100 schools across the United States 
were included in the sample. The following discussion describes how the 
universe to be sampled was defined and gives details on the procedures used for 
stratifying the universe ♦ drawing the sample , cOid administering the test. 

A. SAMPLE SELECTION 

1, Definition of Vniverse 

Eligibility criteria, as est^lished by the ESAA Pilot Program, were, first, 
that the school have more than 50% minority •-gro\:^p enrollment, and second, that 
the school be in a district with more than 50% minority-group enrollment on a 
district-wide basis (or, in the case of large districts, with a minority- group 
enrollment of at least 15,000) . Schools meeting the first criterion (more than 
50% minority-group enrollment in the school) are called "minority group isolated/* 
The term "minority group'*, according to the Act, moans "(i) persons who are 
rjegro, Americai^ Indian, Spanish-surnamod American, Portuguese, Oriental, Alaskan 
natives, and Hawaiian natives and (ii) . • • persons who are from environments 
in which a dominant language is other than English and who, as a result of 
language barriers and cultural differences, do not have an equal educational 
opiK>rt unity 

A rouqh definition of the universe sampled, therefore, is that it consisted of 
all iitudcnts in qrades 3, 4, and 5 enrolled in May 1973 in schools eligible to 
roccivo ESAA TiJot av;ards. However, certain refinements were added. For 
oxanplo, handicapped students who were not testable under the same conditions 
a;; rt/ :ui.ir ^:tuJcnt.'; had to b»v excluded, in order for a standardized test 
i attory t'^ u.5c'd. T:u:> tnd other practical con^'^ adorations led to the following. 
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"The tiniverse sampled included all students who in May 1973 had all of the 
following characteristics according to the most recent HEW Office of Civil Rights 
Survey t 

ID Enrolled in regular grade 3# 4t or 5 classes or equivalent 
ungraded classes « (Special education classes were excluded,} 

(2) Enrolled in a district in the continental United States with a 
total enrollment (according to last available figures) of 300 or 
more students • Since districts with fewer than 300 students are 
not systematically surveyed by the Office of Civil Rights (OCR) » 
the reports on them are incorf^lete and unrepresentative* In 
addition* the per-student cost of testing in these districts 
would m very high. Only 1*2 percent of all U*S« elementary 
and secondary students were enrolled in 1971 in such districts. 
Therefore* these districts were excluded from the norms universe. 
Hawaii is not surveyed by OCR. Alaska was excluded for logistic 
reasons # as well as for its unique minority situation. 

(3) Enrolled in a district whose minority enrollment (according to 
last available OCR figures) was either greater than 14*999 or 
greater than 50% of the district's total enrollment* (Notet 
The ESAA definition of •*minority" includes several groups not 
specifically covered by OCR surveys, sucn ac Tortuguese, AlasKan 
natives, Hawaiian natives, and certain persons from non-bnglish- 
speaking environments. No attempt was made to estimate the impact 
of this change in definition on minority enrollments as surveyed 
by OCR.) 

(4) Enrolled in a minority-isolated school (actual condition in May 
1973, whetiier previously reported or not). An attempt was made 
to u«e the ESAA definition of "minority" to determine current 
minority isolation of schools. 

*The office; »>f Civil Rights Survey, conducted each school year, reports 
onrollrK'nt by district, and by school within district, indicating the actual 
number!^ of studont?5 of various minority backgrounds at each grade level. 
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<5) Attending school and testable on the day the test was giver* ♦ 
Absent students were not given make -up tests. Students were 
considered testable unless they had some handicap or sever<& 
language difficulty that interfered with their taking the test 
under the saine conditions as regular students* 

Estimates of districts, schools, and students in the universe as defined were 
made from the latest available OCR data. For the itost part. Fall 1971 survey 
data were used. Preiixninary 1972 data were used in a few cases to confirm or 
establish the xmiverse tuexnbership of marginal districts, 

A detailed analysis was made of OCR Report 71--441* which tabulates minority 
enrollments as of October 1971 in all districts surveyed. Within each district, 
all schools are listed in this report in descending order of minority percentage, 
and enrollments by type of minority are given. All minority-isolated schools 
in minority-isolated districts (or in districts with more than 14,999 total 
minority enrollment) were noted. For each such school, tlie number of students 
enrolled in grades 3# 4, and 5 (or in equivalent ungraded classes) was estimated. 
The estimate took into account the number of grade levels and the apparent 
proportion of 3rd, 4th, and 5th graders present in each school. Special estimating 
factors for schools reporting ungraded classes were based on other reported 
information such as grades present (e.g., an entry of "K4S6U" is assumed to 
have graded classes for kindergarten and grades 4, 5, and 6, and the ungraded 
equivalent of grades 1, 2, and 3). The universe so established for the 
continental United Statec totaled 1,506,751 students in 691 districts. Tables 1 
and 2 indicate the propostion of the total student and district universes repre- 
sented by these values. 

2 • Over all Design of Sample 

A sampling plan was adopted for selecting 30 students at random across sections 
within each of the three grades in a stratified random sample of 100 minority- 
isolated schools, for a total of 9,000 students. This number and configuration 
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Table 1. Students in U.S., in Restandardization Universe, and in Sample 



ERIC 



Student Groups 



Number 



Percent of Total 



Students in U«S« public schools 

StudenTis in districts with 300 
pupils or xnore 

Students in 3rd, 4th, and 5th 
grades in U^S^ public schools 

Students in minority-isolated 
schools in districts in 
continental U^S, with 300 
pupils or more eligible for 
ESAA Pilot Program grants 

UNI\^RSE t Students in 3rd, 
4th, and 5th grades in 
minority-isolated schools in 
districts with 300 pupils or 
more in continental U.S. 
eligible for ESAA Pilot 
Program grants 

S AMP L E ; 90 students in each of 
100 randomly selected schools 



45,428,464 (1} 
44,886,914^ . <1) 
11,000,000 



100% 



98*8% 



24.2% (2) 



6,250,000 (3) 



13.8% 



l,506//5i (3) 
9,000 



3.3% 



(0.6% of 
universe of students} 



Table 2« Districts in U.S«, in Restandardization Universe, and in Sample 



Districts ^ 


Number 




percent of Total i 


j 

All districts in U.S. i 

■ 


16,515 


(1) 


100% ' 


Districts with 300 pupils or 
more 


11,666 


(1) 


70 6% 


UNIVERSE 1 Districts in 
continental U.S. eligible for 
ESAA Pilot Program grants 


691 


(3) 


4.2% 


SAMPLE: Districts containing 
100 randomly selected schools 


65 




0.4% : 
(9.4% of universe of , 
districts) 



Notes to Tables 1 and 2 ; 

(1) Kducation Directory. 1072-73 Public School Systems , GPO, 1973. 

(2) Statistics of Local Public School Systems, Fall 1969, Pupils and Staff , GPO, 1971 

(3) Estimates based on OCR Survey data. 
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of studontia vas mai^agoablo within the limits of project resources i and was 
judged by staff sampling specialists to be adequate for restandardization 
purposes • 

A two -stage sampling process was used in which the primary units were schools 
and the secondary units were students* When a school did not have all three 
grades of interest (3, 4, and 5), it was combined with a complementary school in 
the same district to form a pair of schools with all three grades. A comple- 
mentary school was needed when the originally selected school had only one or 
two of the three target grades i the complement was always a school that had 
grades either iinmediately higher or immediately lower tdian the originally selected 
school and that either received students from or sent students to the originally 
selected school each year. For this discussion r '^school" will refer to a 
primary sampling unit which is either an individual school or a pair of schools. 

Some reduction in costs could have been achieved by adopting a three-stage 
sample, where the first-stage units would be districts, the second-stage units 
schools, and the third-stage units students {for example, a stratified random 
sample of 50 districts, each with two schools and 180 students). The prime 
consideration in adopting a two-stage rather than a three-stage sample was thot 
the comi^onent of samt^ling error associated with variation among schools is larger 
than the component for variation among students within schools. Since field 
staff were readily available for conducting tests in widely-scattered locations, 
the cost and time savings from testing in fewer districts woulU rot be sufficient 
to offset the lower statistical efficiency of a three-stage sample. Therefore ^ 
it was decided to reduce the between-school component of error by adopx,ina ^ 
two-stage sample. 

3. Strat ifi catio n 

Considerations affecting the approach to stratification included the following: 

(1) As nearly as possible, every student in th defined universe 

should have the same chance of being in the sample. Therefore all 
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strata should contain approximately equal nuinbers of students, so 
that randc«n selections in one stratum would have the same 
probability as random selections in all other strata « 

(2) Sampling error for the two-stage design selected should be estimated 
from a measure of variation among schools within strata. To have 
an unbiased estimate cf sampling error , a minimum of two schools 
from each stratum was i\ocessary. On the other hand* to derive as 
much benefit from stratification as possible, a maximum number of 
strata were needed. 

On the basis of these considerations, the \Hiiverse was divided into 50 strata of 
approximately equal size (i.e., about 30,000 students each) and two schools 
were selected at random from each stratum, to reach the desired total of 100 schools 
Schools were selected with probability proportional to size (in estimated number 
of students in the defined universe) , following generally accepted practice for 
multistage sampling. This procedure was statistically efficient in terms of 
tne goal of minimum sampling error. Also, since schools were selected with 
probability proportional to size and since all students had nearly the same 
chance of being in the sample, the number of students selected for testing was 
approximately the same in each sample school— a desirable feature from the 
standpoint of test administration. 

Three kinds of information were readily available for stratifying schools: 

(1) geographic location, (2) ethnic composition, and to some extent (3) degree 

of urbanization. Information on size of school was used in the process of 

selection; that is, schools were selected with probability proportional to size. 

This was a more effective way of using school size information than using it for 

stratification. 

Income level of the community was considered as a possible additional criterion 
foi stratification, since it might have reduced variance slightly. However, 
income figures were not available in usable form. Furthermore, most schools in 
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the survey population were presumably in low or low-to^medium income areas, so 
that the income range would be somewhat limited* Also, variation in income is 
probably associated with the tiiree more readily available typeb of information: 
hence potential gains from using income-'level data appeared to be moderate* 

The objective of stratification was to minimize variation among schools within 
strata* Three criteria based on the three available types of infonnation were 
used: <1) percent tninority, (2) degree of urbanization, and (3) geographic 
region. Percent minority, as a characteristic of individual schools, was broken 
down into five categories (90--100%, 80'"89^9%, 70-79^9%, 60-69.9%, and 50-59.9%). 
Urbanization and geography were considered together to produce several categories: 

(a) metropolitan, i.e., located in a district that either exceeded 5,000 in 
measure of size (M.O.S.) in the defined universe or was in the same Standard 
Metropolitan Statistical Area as another district exceeding 5,000 in M.O.S.? 

(b) non "-metropolitan, non-Southern (Southern States were defined as Alabama, 
Arkansas, Florida, Georgia, Louisiana, Mississippi, North Carolina, South Carolina, 
Tennessee, and Virginia),* (c) non--metropolitan. Southern, medium <M.O.S. 800 to 
5,000); and (d) non-metropolitan. Southern, small {M.O^S. under 800). 

In addition, geographic location from West to East was used as a general 
criterion for grouping schools together into strata after all other criteria 
had been applied. 

Application of these stratification criteria to the defined universe produced 
the arrangement of strata summarized in Table 3. In the 90-100% minority 
category there were enough students for 29 strata of 30,000 students each. 
Students in 90-100% minority schools constituted 58% of the defined universe. 
These 29 strata were established by arranging all 90-100% minority schools in a 
geographic-urban order and then counting off groups (strata) of approximately 
30,000 students. Boundaries were placed so that schools were not divided 
between strata; however, districts were frequently represented in more than 
one stratum. The geographic- urban order followed for the first 29 strata was: 
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Table 3* Sumiuary of Strata of Schools for Restandardization Saxnpla 



strata (30,000 
Pupils Each)* 


Percent Minority 
in Individual 
Schools 


Dominant Geography 
of schools 


\ 

} 
\ 


1-3 


90 




100% 


West Coast Metropolitan 




4-5 


90 




100% 


Southwest Metropolitan 




6-12 


90 




100% 


Midwest Metropolitan 




13 - 18 








Northeast Metropolitan 




19 - 21 


90 




100% 


Mid-Atlantic Metropolitan 




22 - 24 


90 


- 


100% 


Southeast Metropolitan 




25 - 26 


90 


- 


100% 


Southwest Non-metropolitan 




27 


90 


- 


100% 


Mid-Atlantic Non--metropolitan 




28 


90 


- 


100% 


South Medium 


i 


29 


90 




100% 


South Small 


1 

i 

J 


30 - 32 


80 




89 % 


Metropolitan 


1 

» 


33 - 34 


80 




89 % 


Non-me tropol i tan 


4 

i 
1 

1 


35 - 37 


70 




79 % 


Metropolitan 




38 - 39 


70 




79 % 


Non -me t ropol i tan 


1 


40 - 42 


60 




69 % 


Metropolitan 


1 

j 
1 


43 - 44 


60 




69 % 


Non-metropolitan 


t 

1 


45 - 47 


50 




59 % 


Metropolitan 


i 


48 - 50 


50 




59 % 


Non-me t ropo 1 i t an 


* 
1 



There are 50 strata. Lines between groups of strata indicate shifts in 
geography {single line) or in percent minority (double lines) . 
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I. Metropolitan, West to East (strata 1-24) 

2^ Non-metropolitan, non-Southern, West to East (strata 25'*27) 

4» Non^-metropolitar, Southern ^ mediijm (strattun 28) 

5. Non-metropoiitan, Southern, Small (stratum 29) 

A similar geographic-urban order was followed for the remaining 21 strata • 
However, as Table 3 shows, there were only five or six strata for each minority 
percentage type below 90%, so that geographic -urban homogeneity of equally- 
sized strata became more difficult to achieve. Nevertheless, each stratm was 
made up of schools that were rather similar to each other in the stratification 
criteria used: percent minority, degree of urbanization, and geographic 
location • According to the numbers of strata occupied by the five different 
percent iidnority categories, approximately 84% of the students in the universe 
were from minority groups » 

4. Selectio n Q ^ S c h o o^ls 

After all schools in the universe had been distributed into the 50 strata 
described above, two schools were randomly selected from each stratum, using a 
table of random numbers* In order to verify that the 100 schools so selected 
on the basis of 1971 data were still in the defined \xni verse, figures for each 
selected school were checked against 1972 OCR Survey forms. Two of the 100 
schools were discovered to have changed characteristics sufficiently to fall 
outside the universe. The two schools were deleted from the sample, and 
substitutes were drawn at random from the same stratm. 

After this adjustment, the 100 schools in the sample were tabulated by district. 
Forty-^nine districts were represented b;.*^ :ne school each, eleven by two 
schools, one by three schools, one by four schools, two by six schools # and one 
by ten schools. Thus 55 separate districts were included in the sample. 
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On March 23, 1973, after receiving USOE approval of the selected sample of 
schools t the following letters signed by the Assistant Coitanissioner for 
Planning, Budgeting, and Evaluation were mailed: 

# Sixty-^five letters to the superintendents of the districts 
containing the 100 selected schools • These letters gave 
background information and requested cooperation in the 
restandardization activity « 

# Twenty-four letters to the chief school officers of the 
States in which districts had been selected for 
restandardi^ation testing. Each letter enclosed a copy 

of the letter to superintendents, along with a list of the 
districts and schools selected in the State « In each State # 
a copy was also sent to the coordinator of the State's 
Committee on Evaluation and Information Systems. 

# Eight letters to the commissioners of the HEW Regions in 
which districts had been selected for restandardization 
testing* Each letter enclosed a copy of the letter to super-- 
:\ntendents, along with lists of the districts and schools 
selected in the Region » In each Region, a copy was also sent 
to the BEEO Regional senior program office. 

Sample copies of these letters are provided in Appendix A of this report. 

The March 23 letters to superintendents requested a phone call to the USOE 
Project Officer naming a contact for concluding arrangements. In most in- 
stances, replies were favorable, straightforward, and y^asonably prompt. In 
these cases, SDC called back each contact to verify permission, to inquire about 
standardized tests being given this year in the selected schools # and to deter- 
mine whether complementary schools were needed. 



34 

17 



The first 30 districts contacted wer<5 asked about tests being used^ in order to 
judge whether the California Achievement Test <CAT) battery chosen for re- 
sta»idardia:ation vrould have recently been given to the same students • Only six 
districts indicated that the CAT was being used this year in the schools 
selected for the sample; in most of these cases, only one grade was being tested* 
Other tests frequently mentioned were the Comprehensive Test of Basic Skills 
<9 mentions) , the Metropolitan Achievement Test (6 mentions) $ and the Iowa Test 
of Basic Skills (4 mentions) , 

Of the 100 schools selected, 17 refused to pe^nit testing. Reasons for failure 
to cooi^erate were varied* and included dislike of testing, dislike of ESAA, 
recent testing in the school selected, and disagreement with restandardization 
objectives. Substitutes were arranged for 16 of the 17; testing took place in 
99 schools. Nine of the substitutes (in five districts) were arranged in the 
same districts as the originally selected schools; the other eight schools were 
arranged in seven other districts. In all cases, substitutes were chosen from 
tile same stratum as the original school and were as close as possible in measure 
of size to the original school. 

Several of the large school districts drawn into the sample occui^ied one or more 
entire strata. Substitutes in these cases could only be made within the same 
district {following the rule that substitutes would always be drawn from the 
sane stratum as the originally selected school) . This situation led to pro- 
longed and only partially satisfactory negotiations with one school district; 
hence, the number of schools tested fell one short of the desired 100. Table 4 
gives the distribution by State of districts and schools in the sample, after 
all sub??ii tut ions had been made. 
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Table 4. Number of Districts and Schools in the Pinal Restandardization Sample 



i .II....... .I..,..., 


NuB^er of 


Nun^er of 


state 


Districts 


Schools 


Alabama 


4 


4 


California 


5 


12 


District of Columbia 


1 


3 


Florida 


5 


8 


Georgia 


2 


2 


Illinois 


1 


6 


Indiana 


1 


1 


Louisiana 


1 


2 


Maryland 


1 


2 


Massachusetts 


1 


1 


Michigan 


1 


2 


Mississippi 


3 


3 


Missouri 


2 


3 


New Jersey 


7 


8 


New YorH 


2 


10 


North Carolina 


3 


3 


Ohio 


3 


4 


Pennsylvania 


2 


3 


South Carolina 


5 


6 


Tennessee 


1 


1 


Texas 


10 


12 


Virginia 


2 


2 


Wisconsin 


JL 


JL 


TOTAL 


64 


99 
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5^ V?i tlun Schoo ls, 

Because of ti:ne constraints prior to tosting, it was not feasible to get a list 
of students from each school in tho sample so that students could be randomly 
selected at a central location. Procedures were needed to avoid bias in the 
selection process. Using one intact class as a sample of a grade composed of 
more than one class would have been statistically inefficient. Also« non** 
randomness might have entered if selection were done by persons not acquainted 
with sampling pitfalls. 

The solution adopted contained provisions for a variety of circumstances. If, 
for example, the total number of students in all the school *s classes at a given 
grade level was less than 31, then all students in that grade were selected. When 
the number in a grade exceeded 30, students were systematically selected from 
separate classes according to their last initials, as determined by an alphabetic 
scheme which rotated letters in alternating patterns from class to class. 
Procedures were also developed to randomly increase or decrease the sample for 
a given grade in a school in order to keep the number tested close to 30 per grade. 

In summary, testing actually took place in 64 of the 691 districts identified 
as the appropriate population of ESAA-eligible minority-isolated schools. Ninety-- 
nine separate school units were involved, h total of 8,999 students, approxi- 
mately evenly divided from the three target grade levels, were tested. This sample 
represents a two-stage sampling strategy where each student in the estimated 
population of 1,506,751 eligible students had nearly the same chance of being in 
the sample. 

TEST ADMINISTRATION 

A second instrument was administered at the same time as the achievement measure 
to gather information related to the background of the students sampled. This 
instrument, tiie Student Background Questionnaire, asked for information related 
to ethnic self -identification, measures of socioeconomic level (as reflected in 
household possessions} , language spoken in the home, and home educational 
experiences, A copy of this questionnaire is included as Appendix B# 
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Tout foma vorii i^roparou r^r. r^.i-rKino-scannoblo forms iind were coded with school 
iduntifi cation, j^tudo^*t nano^ ^i^nd a studunt idor4tification nuinbor* The* student 
questionnaire was albo coded with studont naino and identification nur^^r for thQ 
purpose of matching r<^^sponses on th^ two instruments for later analysis* 

Students in 99 tslomentary schools were tosted during May and June 1973 • In 
order to create a standardized t^esting situation, the actual admnist ration of 
the achievement measures was subcontracted to American College Testing Program. 
Professional test administrators x^ere used to ensure o^^^^^^Jf uniformity of 
administration and stricter adiierence to time schedule than might foe possible 
using classroom teachers or other school personnel, 

American College Testing (ACT) conducted special training s^>ssions in which test 
supervisors were introduced to the materials and procedures which were to be used 
in the administration • Supervisors were selected who were familiar with the 
region and had prior testing experience • After training, each supervisor con- 
tacted individuals from the local communities of each test site to serve as 
proctors during testing* Supervisors and their proctors were selected to reflect 
as closely as possible the ethnic balance of the local school populations. This 
team then conducted the entire administration m a standardized setting. An 
example of a typical administration is given below: 



Typical A d mi nis^ ^ Schedule 



8:00 a.m. 



Supervisors and proctors arrive at the school. 



8:15 



9:15 



Supervisors review testing procedures and brief proctors 
on last minute details. Testing facilities are checked. 
Student sample is selected and students are called to test 



site . Tntroductions • 



9:15 - 11:30 



Student Background Questionnaire and test administration. 



11:30 



Approximate end of test administration. Supervisors 
check materials, package answer sheets, and fill out 
report of testing irregularities (if any). 
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In order to mei^t the first goal ot the reatandardl nation testing (i.e^t assessment 
of the academic needs of students in ESAA-eligible minority^isolated schools) and 
to describe the saiupie in sufficient detail to support further analysis^ a 
descriptive analysis was initiated* This chapter first describes the sample of 
students actually tested • Second, characteristics of the san^le, as gathered in 
the Student Background Questionnaire^ are described. Third, the achievement 
level.^ of the sample are reported • Finally, student achievement levels and back- 
ground characteristics are summarized* 

STUDENT SAMPLE 

The original sampling design called for the testing of approximately 9,000 students* 
3,000 in each of the three levels of interest* Within the 99 schools actually 
participating, the breakdown of the 8,999 students tested is shown in Table 5^ 



Table S« Achievement Test and Background Questionnaire liespondents 



Grade 


Achievemenc 


BacHgrouncl Questionnaire 


Combined 


■ 3 


3,025 


2,500 


2,500 


4 


3,011 


2,461 


2,461 


1 ^ 


2,963 


2^461 


2,461 


1 TOTAL STUDENTS 


8,999 


7,422 


7,422 



The smaller number of Student Background Questionnaires results primarily from 
the refusal of some school districts to administer that instrument, although 
there were several cases of individual non*«respondents within schools where 
the questionnaire was administered. In particular, several school districts in 
California declined to participate in the questionnaire portion of the restandardi^ 
nation testing « 

One large city district allowed certain of the items to be administered during 
the testing session, and the rest to be administered only via a questionnaire 
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iriCiudtDd m tno data ba^^o and tho scViool was counted as havinvj cooperated* 

Additional testing irri^aularitios cccurrod in certain arado levels in three 
schools^ n^st ;>f£*5ct of the^^^t^ irregularities was to disrupt tlie de^ir^d 

standardised te^^^ting situatioiu Since the results of these disiturtoances wore 
unpredictable, the data fron these grades for the three schools were rexnoved 
from further analysis • 

In order to ascertain whether the students who responded to the question- 
naire were systematically different from those who did not^ the two 
groups were con\pared on tlie only neasure available which was common to both 
groups— tile achievement tes>d. Performance on each subtest wa^ compared for the 
two groups. As is seen in Table 6, differences in achievement levels between 
the two groups were not siqnificant. f'or this reason t and because it was 
desirable to be able to associate student background with performance, all 
subsequent analyses were performed on tJiat sxibset of students for whom both 
achievement tests and background questionnaires were available* 
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STUDENT BACKGROUND CHARACTERISTICS 

The twelve items on the Student Background Questionnaire focus # for the most 
partt on characteristics of the student's home environments These character^ 
istics include the number of people the student lives with* the language spoken 
at home, whether or not there is someone at home who can and actually does help 
him/lier with school work^ the availability of various types of reading materials, 
appliances, and convenience items, and the number of hours per day the student 
spends watching television. In addition, there are a few questions concerned 
with the ethnic self -identification of the student and his/her prior educational 
history (i,e., the nuidaer of different schools attended and the length of 
attendance at the present school.) 

The marginals for all items on the Student Background Questionnaire were 
computed for each of the three grades in the restandardization sample* 
Since the pattern of responses and the views expressed by the majority of the 
students in each grade were quite similar • the students' background characteristics 
for the restandardi^ation sample are presented here in a general p descriptive 
manner with notable exceptions mentioned. Item-by-item breakdowns of responses 
are given in Appendix B for all grades combined and for each grade level. 
Appeiidix C contains item-by-item breakdowns of responses across all grade levels 
for each of the four modified ethnic categories defined below. Significant 
differences (a ^ .01) in response patterns between ethnic groups were found for 
all items on the Student Background Questionnaire except for question #7 and 
part G (tape recorder) on #11. These differences are described in Table 7. 

The original ethnic groups included in item #9 were Black (Negro) , Oriental 
(Japanese, Chinese, etc*}, American Indian, White, and Other (Eskimo, Hawaiian, etc*). 
Item #10 also asked the students if they considered themselves to be of a Spanish 
backgroxind (Mexican, Cviban, Puerto Rican, Latin American). Since the original 
marginals indicated that there were only 1.2% Oriental students and 3.4% American 
Indian students, these categories were merged with the "Other" category for 
purposes of further analysis. Also, Spanish-background students were considered 



Table 7. jsignificant Differences Between Groups on 

Student Background Questionnaire Items (Sheet 1) 



Item 


Differences Between Groups 


1 


Whites are more often new to the school • 
Blacks and Spanish-background students are more likely to 
have been at their present school since kindergarten or 
first grade • 


2 


Black, Spanish^-background, and Other students more often have 
gone to only one school since kindergarten than have White 
students • White students are more likely to have gone to three 
schools • 


3 


White students are more likely to live with three to four other 
people at home than are Black # Spanish, and Other students. 
These students are more likely to live with seven or more other 
people than are White students. 


4 


Very slight difference between the ethnic groups* White and 
Other students tend to have done more reading at home (not 
related to school work) in the past two weeks. 


5 


White and Other students are more likely to have done school work 
at home during tlie past two weeks than are Black or Spanish*- 
background students. 


6 


Spanish students are least likely to have anyone in their homes 
who can help them with their school work* 


7 


No significant differences. 


8 


Black students are most likely to speak only English at home, 
with White students slightly less likely. Other, and particularly 
Spanish students tend to speak another langua^je at home besides 
English^-^Spanish for the Spanish-background students and Chinese, 
an American Indian language, or some other language for the other 
s uucienus • 


llA 


Other and Spanish-^background students are less likely to have a 
daily newspaper in their homes. 




Whites are most likely and Spanish-background students are least 
likely to have a dictionary, encyclopedia^ magazines, and color 
television in their homes. 
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Table 7« Significant Differences Between Groups on 

Student Background Questionnaire Items (Sheet 2} 



Item # 


Differences Between Groups 


XID 


White students are most like3^ and Black students slightly 
less likely to have story books in their homes, while Spanish- 


IIP 


Black students are most likely to have a record player in Ue5r 

HOTn^fi « tJl iih Ottilia y* C5^iir^gat>i*<2 T*aA«:t* l^le^alv 


IIG 


No significant differences « 


111 


White and Other students are most likely to have a typewriter 
in their homes* Spanish-background students are least likely 
to have one in their homes. 


IIJ 


White students are most likely and Black students are least 
likely to have a dishwasher in their homes* 


ilK 


White students are more likely than any of the other groups to 
have two or more cars or trucks that run* 


IIL 


White students are most likely and Black and Spanish-background 
Suuctenus xeasu xxKexy uo nave an aut^omarxc cxotines ciryer xn 
their homes. 


IIM 


Other students are most likely and Spanish-background students 
are least likely to have a special place to study* 


12 


Black students are most likely to watcdi television more than 
three hours a day. 
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to be an important separate ethnic group. Although many students were of 
Spaniaih background (as they indicated in item #10) , their responses to 
item #9 were scattered throughout the five ethnic groups, <The percentages 
for item #9 in Appendix C illustrate this point*) 

In order to make the ethnic distinctions very clear* the Spanish-background 
students were removed from the Black, Oriental, American Indian, White, and 
Other categories and were merged together to form the Spanish-backgroimd ethnic 
group* Thus^ four principal ethnic groups resulted from this process: Black, 
White, Spanish-background, and Other. Table 8 gives the percentage of each of 
these ethnic groups in the sample, for all grades combined and by grade levels 
These four ethnic categories were utilized in all the analyses of the 
achievement data and the Student Background Questionnaire data* 

In the rostandardization sample, 83% were minority students* The largest single 
group of students were Black (60%) , followed by Spanish-background students 
(21%), ;^ite students (15%), and Other students (3%)* The validity of the 
sampling techniques is supported by the correspondence between the estimated 
(84%) and actual (83%) minority representation. 

Several questions dealt with various aspects of the home environment. Most 
frequently (31-34%), the students live in a home with five or six other people. 
However, white students are more likely to live with three or four other people 
at home than are other groups, while minority students are more likely to live 
with larger families of seven or more other people. Generally, only English is 
spoken at home (74%). However, 68% of the Spanish-backgrovind students speak 
Spanish at home. For students in the Other category, Chinese (8.4%) , some 
other language (S.0%), and an American Indian language (5.6%) are sometimes 
spoken. 

Most of the students (75%) stated that they had done reading unrelated to school 
work at home in the two weeks prior to testing. White (76%) and Other (71%) 
students are more likely to have done school work at home in the two weeks 
prior to testing than are Black (66%) or Spanish-background (66%) students. 
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Table 8* Percentages for the Modified Ethnic Categories 



Grade Level 



^ Ethnic Group \ 




Combined 


3 


4 


5 


White 




15% 


16% 


16% 


14% 


Black 




60% 


58% 


60% 


62% 


Spanish Background 




21% 


22% 


21% 


21% 


Other 




3% 


4% 


3% 


3% 
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Although it appears that amost all of tho students can get help at home with 
their school work {93%) t the Spanish-background students can get help least 
often {88%) . Fewer students (53%) had actually received help during the two 
weeks just prior to testing. 

Several valuable resource materials and conveniences are available in the homes 
of the students. There are differences in response patterns between the 
various grade levels and the ethnic groups for this item* In all grades, the 
four most frequent materials, in descending frequency, are: a record player 
<89%) , story books (83%) , a dictionary (78%) , and magazines (70%) • The last 
three support the contention that students do reading at home unrelated to school 
work* 

After the first four items, students in different grades differ in the order 
of frequency of certain materials in their homes. Table 9 specifies the order 
of frequency for each of the three grades. 

Students in all three grades agree on the four items that are least frequently 
found in their homes; the lack of these items seems to indicate the socio-- 
economic status of the sample students' families. These four items, in 
descending frequency, are: two or more cars or trucks that run, a typewriter, 
an automatic clothes dryer, and an automatic dishwasher. These items could be 
classified more as luxury items than as necessities. 

The minority students agree that the four most frequent items in their homes 
are: a record player (83-91%), story books (75-85%), a dictionary (69-80%), 
and magazines (65-70%) . The White students indicated the same order except that 
story books (88%) were most frequent, followed by a record player (87%). After 
these items, the frequency of the next seven items differs for all the races. 
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Tablo 9, Order of Frequency of Matoriais 
in tho Hone by Grade Level 



1 

Order 
of Frequency 


Grade 3 


Grade 4 


Grade 5 


1 


Record player 


Record player 


Record player 


2 


Story books 


Story books 


Story books 


3 


Dictionary 


Dictionary 


Dictionary 


4 


Magazines 


Magazines 


Magazines 


5 


Special place to 
study 


Daily newspaper 


Daily newspaper 


6 


Daily newspaper 


Encyclopedia 


Encyclopedia 


7 


Color TV 


Tape recorder 


Color TV 


8 


Encyclopedia 


Color TV 


Special place to 
study 


9 


Tape recorder 


Special place to 
study 


Tape recorder 


10 


Two or more cars 
or trucks 


Two or more cars 
or trucks 


TWO or more cars 
or trucks 


11 


Typewriter 


Typewriter 


Typewriter 


12 


Automatic clothes 
dryer 


Automatic clothes 
dryer 


Automatic clothes 
dryer 


13 


Automatic dishwasher 


Automatic dishwasher 


Automatic dishwasher 




Tabic 10 rrosents the order of frequency for each of the ethnic groups. 
Iti-ms j^uch as a color TV and two or nvDre cars that run are nore frequent in the 
homes of White students than in the homes of minority students. All groups 
seen to have relatively easy access to a daily newspaper and an encyclopedia. 

For all ethnic groups the two least frequent items are an automatic clothes 
dryer <27-46%) and an automatic dishwasher (10-23%) * Although these are 
luxury items, a much larger percentage of White students respond that these 
items are present in their homes than do minority students. 

In general, White students are more likely to have each of the items listed in 
question #11 in their homes, whereas Spanish-background students are the least 
likely* 

Students frequently responded (50%) that they watch television more than 
three hours per day* The Black students are most likely to watch television 
this amount. The least frequent response was no television^watching at all (7%) « 
The fourth- and fifth-grade students reported watching more television than the 
third graders* 

Finally, minority students (28-31%) are more likely than White students (19%) 
to have gone to the same school since pre -school or kindergarten* White 
students are more likely (24%) to be new this year at the school* In 
correspondence to item #1, the results of item #2 indicate that minority students 
are more likely (41-46%) than White students (32%) to have gone to only one 
school since kindergarten* 



32 



Table 10. Order of Frequency of Materials in the Home 
by Ethnic Group (Combined Grades) 



Order 
of Frequency 


Black 


White 


Spanish 


Other 


i 


Record player 


Record player 


Record player 


Record player 


2 


Story books 


Story books 


Story books 


Story books 


3 


Dictionary 


Dictionary 


Dictionary 


Dictionary 


4 


Magazines 


Magazines 


Magazines 


Magazines 


5 


Daily newspaper 


Encyclopedia 


Daily newspaper 


Special place 
to study 


6 


Encyclopedia and 
special place to 
study (tie) 


Daily newspaper 


Encyclopedia 


Daily newspaper 


7 


Encyclopedia and 
special place to 
study (tie) 


Color TV 


Tape recorder 
and color TV 
(tie) 


Encyclopedia 


9 


Color TV 


Two or more 
cars 


Tape recorder 
and color TV 
(tie) 


Tape recorder 


10 


Typewriter 


Tape recorder 


Two or more 
cars 


Typewriter 


11 


Two or more 
cars 


Typewriter 


typewriter 


Two or more 
cars 


12 


Automatic clothes 
dryer 


Automatic clothes 
dryer 


Automatic clothes 
dryer 


Automatic clothes 
dryer 


13 


Automatic 
dishwasher 


Automatic 
dishwasher 


Automatic 
dishwasher 


Automatic 
dishwasher 
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C. STUDENT ACHIEVEMENT LEVELS 

Tho distribution of subtests and total scores for students includod in all 
subsequent analyses are summarised in their raw score form in T«d^le 11. 

One should take care to note that different levels of the achievement measure, 
containing different numbers of items, are used in grade 3 and grades 
Achievement levels for third-grade students are more typically toward the upper 
end of the achievement levels of their fourth- and fifth-grade schoolmates^ The 
range and skewness values for each subtest and total scores are presented in 
Table 12. These data may indicate the presence of a possible ceiling effect for 
third**grade students in the Level 2 instrument (three of four subtests are 
negatively skewed). The overall proportion of correct responses is only .63, 
suggesting that this danger is not too great. 

The data also indicate considerable variability of scores within the ESAA-eligible 
sample, as indicated by the standard deviations and ranges reported in Tables 11 
and 12. I^ile some students are operating below chance level, others ax-o 
exhibiting near-perfect performance. Score reliability as comj)uted using Kuder- 
Richardson Formula 20 (KR~20) , particularly for total scales, is quite adequate. 
Reliability values range from 0 to 1 and are an indication of the homogeneity 
of the items within a test and the replicability of results of measurements made 
at different points in time. High reliabilities (greater than .90) indicate stable 
measurements. 
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Tablo 12* Range ^nd Skevn^j^s of Achievement Test Scores by Grade Level 





i 

Grade 3 | 


Grade 4 


Grade 5 




Ran^e 


Skewness* i 






Hange 


Skewness 


Readinc; Total 


83 


0.005 1 


78 


0.723 


81 


0.525 


Vocabulary 


38 


-0.502 


38 


0.576 


39 


0.359 


Comprehension 


45 


0.199 1 


40 


0.637 


42 


0.503 


Math Total 


102 


-0.598 


71 


0.318 


72 


-0.217 


Computation 


72 


-0.692 


48 


0.301 


48 


-0.186 


Math Concepts 


30 


-0.372 


25 


0.226 


25 


-0.279 



«A distribution is considered skewed when there is a considerably larger number of 
extreme cases on cno side of the distribution curve than on the other. When the 
result is a positive ntimber, the distribution is skewed to the right (extremely 
high scores are farther away from the mean than are low scores) i when the result 
is negative, the distribution is skewed to the left. 
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SIJIiMARY OF STUDENT ACKIEVE^ffiNT LEVELS AJCD BACKGROTJND CHARACTE3USTICS 

Thv data i>irt^iiv:i4tcd abov^j duscribu th^ background characteristics and achieve- 
rwint iovela of stutiefits sampled from ESAA-eligiblo minority-isolated schools* 
Inasmuch as the students were randomly sampled within schools across classrooms 
these data should accurately reflect the characteristics and performances of 
student;^ from the defined jpopulation. 

The yuncral picture that one obtains is a student population with a high 
percenta^je of minority students from lower socioeconomic levels (as indicated 
by the absence of major appliances) ♦ The home environments of these students 
include several media that could be of educational value 5 record players, 
books f dictionaries* and magazines « 

The achievement levels of these students, when compared to the national norms 
for each measure* ^ indicate depressed levels of performance* Table 13 shows 
the percentile rank and grade-equivalent (GE) level associated with the median 
performance for each grade on each s\)btest* Percentile ranks typically hover 
around the twentieth percentile level. The students tend to be somewhat 
weaker in reading and math concepts than in mathematics computation performance 
Additionally, these students tend to fall further behind grade level as they 
advance through school. The latter result indicates that these students reap 
less than a full year of learning during a year of schooling. 

The need of students in minority-isolated schools is clearly established by 
these data. Students eligible for compensatory funds under the definition of 
ESAA actually are achieving at depressed levels. In addition to establishing 
need, these data can be used as a baseline for con^arisons with later 
evaluation data. 



*Because the mathematics subtests used differ from those originally normed, 
special norms wore requested and obtained from the test px^blisher based on 
the same data and process as the original norms. 
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An analysis of the achievement data by ethnic self-identification indicates 
that performance levels are not only generally depressed but are differentially 
depressed for different subgroups of students* Table 14 shows the achioveirtent 
means, standard deviations, and grade equivalents, by ethnic category aiid ^radt.^ 
level, for each subtests An invariable ordering of performance level is rresent, 
with white students always scoring highest, Spanish^-background next, and Black 
students scoring lowest. For the Level 2 measure {Grade 3) Blacks are more 
homogeneous (less variability in test scores) in reading, while Whites are more 
homogeneous in mathematics* For Level 3 (Grades 4 and 5) a similar pattern 
exists for reading performance, but mathematics score variability changes with 
grade level • At the fourth grade. Black students are the most homogeneous 
group, while at the fifth grade. White students are the most homogeneous in 
mathematics achievement. 

Inspection of the grade equivalents* assoc:Tated with each of these achievement 
levels indicates some interesting patterns. While all students in ESAA-eligible 
minority-isolated schools tend to be achieving below grade level and falling 
more and more behind as they progress through school, there are differential 
patterns of this phenomenon between ethnic groups. Grade equivalents are 
expressed in years and months; thus the expected performance at the end of the 
third grade should be 3,9, at the end of the fourth grade should be 4.9, and at the 
end of the fifth grade should be 5,9 • While all students are falling behind more 
each year. White students are doing so least rapidly. Black students most rapidly, 
and Spanish^^background students somewhere in between • This indicates that 
differences between minority and white students are tending to increase as the 
students move from one grade to the next. 



Because these grade equivalents are based on mean performance level, they 
are somewhat different from those based on the medians in Table 13. The 
medians are the most appropriate indicator of typical performance, but 
performance patterns are the same for both indicators* 
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Table 14. Achievement Test Performance by Ethnic Self --Identification (Sheet 1) 



Data for Grade 3 









Standard 

wtS V XC«4LXwll 


Grade 








vjuner 


« V / V 


O • 7X^ 
















2 S 








opanxsn £>acK^xoun€i 




O. ODO 




\ DwA| 






Black 


26.857 


8.497 


.2.4 


(1447) 






Total 


28.0098 


8.5192 


2.5 


(2455) 




comp renens 1 on 


utner 
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vvnxue 
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XX« J^U 


^ 1 
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21.106 
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(1447) 






Total 


22.3853 
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2.7 


(2455) 
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17.323 


2.6 


a44?'. 






Total 


50*3951 


17.7102 
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XO • 
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Black 


17.356 


6.380 


2.1 


(1447) 






Total 


18.4525 


6.3385 


2.3 


(2455) 




Math Computation 


other 


48.319 


17.089 


2.9 


( 91) 




White 


57.262 


15.173 


3.4 


( 385) 






Spanish Background 


50.711 


16.950 


3.1 


( 532) 






Black 


47.641 


17.085 


2.9 


(1447) 






Total 


49.8399 


16.7602 


3.0 


(2455) 




Math Total 


Other 


66.868 


21.771 


2.8 


( 91) 






White 


79.122 


19.553 


3.4 


( 385) 






Spanish Backgrovind 


69.664 


21.927 


2.9 


( 532) 






Black 


64.997 


21.910 


2.7 


(1447) 






Total 


68.2924 


21.5435 


2.8 


(2455) 





Table 14^ Achievement Test Performance by Ethnic Self-Identification (Sheet 2) 



Data for Grade 4 





Group 


Mean 


Stani^ard 
Deviation 


Grade 
Equivalent 


4 

N 


Vocabulary 


Other 


14.692 


7.337 


3.3 


( 78) 




White 


17 .868 


7.877 


4.0 


< 379) 




Spanish Background 


14.240 


6.739 


3.2 


( 496) 




Black 


13.202 


6.549 


3.0 


(1465) 




Total 


14*1944 


6.8335 


3.2 


(2418} 


Comprehension 


Other 


X3«295 


7.419 


3.3 


( 78) 




White 


18.150 


7.795 


3.9 


( 379) 




Spanish Background 


14.633 


6.740 


3.2 


( 496) 




Black 


13,762 


6.135 


3.0 


(1465) 




Total 


14.6778 


6.5846 


3.2 


(2418) 


Reading Total 


Other 


29.987 


13.817 


3.3 


( 78) 




White 


36.018 


14.900 


* 4.0 


( 379) 




Spanish Background 


28.873 


12.274 


3.2 


( 496) 




Black 


26.964 


11.476 


3.0 


(1465) 






28 8722 


12 3056 


3 2 


(2418) 


I Math Concepts 


Other 


9.038 


5.810 


2.8 


( 78) 


White 


12.309 


5.322 


3.7 


( 379 




Spanish Background 


9.643 


5.051 


2.9 


( 496) 




Black 


8.618 


4.745 


2.7 


(1465) 




Total 


9.4202 


4.9361 


2.9 


(2418) 


Math Computation 


Other 


19.949 


10.502 


3.6 


( 78) 




White 


25.156 


10.453 


4.2 


( 379) 




Spanish, background 


21.802 


10.208 


3.8 


( 496) 




Black 


19.091 


9.C21 


3.5 


(1465) 




Total 


20.6257 


9.5518 


3.7 


(2418) 


Math Total 


Other 


28.987 


15.197 


3.4 


( 78) 




White 


37.464 


14.898 


4.0 


( :;79 


i 


Spanish Background 


31.446 


14.269 


3.5 


( 49'5 




Black 


27.709 


12 .643 


3.3 


(1465) 




Total 


30.0459 


13.4380 


3.5 


(2418) 
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Table 14. Achievement Test Performance by Ethnic Self -Identification (Sheet 3) 



Data for Grade S 



scale 


Group 


Mean 


Standard 
Deviation 


Grade 
Equivalent 




Vocabulary 


Other 


20.420 


8.874 


4.6 


( 81) 




White 


23.474 


8.364 


5.1 


( 346) 




Spanish Background 


18.070 


8,154 


4.0 


( 502) 




Black 


17.045 


7.475 


3.8 


(1499) 




Total 


18.2858 


7.7947 


4.1 


(2428) 


CpiQprehension 


Other 


20.444 


8.191 


4 • 5 


.( 81) 




White 


23.055 


8.228 


5.0 


< 346) 




Spanish Background 


18.683 


7.439 


4.0 


( 502) 




Black 


16.874 


6.667 


3.6 


(1499) 




Total 


18.2479 


7.1200 


3.9 


(2428) 


Reading Total 


Other 


40.864 


16.049 


4.6 


( 81) 




White 


46.529 


15.764 


5.1 


( 346) 




Spanish Background 


36.753 


14.668 


4.1 


( 502) 




Black 


33.919 


12.962 


3.8 


(1499) 




Total 


36.5338 


13.8538 


4.1 


(2428) 


Math Concepts 


Other 


12.975 


5.604 


3.9 


( 81) 


White 


15.344 


5.121 


4.9 


( 346) 




Spanish Background 


12.588 


5.530 


3.8 


( 502) 




Black 


11.661 


5.246 


3.5 


(1499) 




Total 


12.4213 


5.2972 


3.7 


(2428) 


Math Computation 


Other 


26.704 


10.537 


4.4 


( 81) 




White 


31.173 


9.816 


4.9 


( 346) 




Spanish Background 


28.534 


11 • 2 


4.6 


( 502) 




Black 


25.895 


10.328 


4.3 


(1499) 




Total 


27.2199 


10.4571 


4.4 


(2428) 


Math Total 


Other 


39.679 


15.168 


4.2 


( 81) 




White 


46.517 


13.940 


4.9 


( 346) 




Spanish Background 


41.122 


15.796 


4.3 


( 502) 




Black 


37.556 


14.465 


4.1 


(1499) 




Total 


39.6413 


14.6919 


4.2 


(2428) 
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In addition to looking at the relationship between ethnic group and achievement 
scores t attention was also given to the possible impact of socioeconomic status 
(SES) on achievement levels^ and its relation to ethnicity • Although no 
clear-cut measure of SES was available, a rough indicator of socioeconomic 
status was available in the form of the item that asked whether or not 
students had certain specified resources in their homes. These resources 
clustered into two groups — those that were more educationally-oriented (such 
as books) and those that were relatively expensive luxury appliances {such as 
a dishwasher). Thus two SES scales were derived, each consisting of five items, 
and a score was computed for each student representing the number of such items 
(ranging from 0 to 5) in his home. Specifically, the first SES scale was 
comprised of the following five items: daily newspaper, dictionary, 
encyclopedia or other reference books, story books, and magazines. The items 
on the second scale included: tape recorder or cassette player, typewriter, 
automatic dishwasher, two or m.ore cars or trucks that run, and an automatic 
clothes dryer. Overall, the average number of items in the first group that 
the students had in their homes was 3.55, and in the second group it was 1,:J0. 
Not surprisingly, both of these SES scales were significantly related to ethnic 
group; in both cases. White students were likely to have the most of these 
items and students of Spanish background the least. Tables 15 and 16 show, for 
each scale, the percentage distribution and mean score on the scale for tach 
ethnic group and for the total sample. 

Next, the possible impact of SES on achievement level was explored. Each of 
the two SES measures was related to the student's total reading and total 
math scores for each of the three grade levels under study. At all grade 
levels, and for both reading and math, each of the SES scales showed a 
significant positive relationship with achievement score. In all cases, the 
first SES measure, comprised of educationally-oriented items, was more strongly 
related to achievement than was the second SES scale. Also, for i>oth SES 
measures, there was a stronger relationship with reading than with math scores. 
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Table 15. Ethnic Group and First SES Measure 



Scores on First SES Measjure 
(number of educationaily- 
orifented items in home) 


Ethnic Group 


Black 


Spanish 
Background White 


Other 


Sax{\ple 


0 




2.4% 


4.7% 


1.7% 


4,4% 


2.8% 


1 




o.O 


10.2 


4,3 


1 /\ A 

i.w.4 




2 




11.4 


17,7 


9.2 


10.4 


12.4 


3 




21.4 


21,0 


16.1 


16.4 


20.4 


4 




27,9 


21.8 


26.7 


25,6 


26,3 


5 




31.0 


24,5 


42.0 


32.8 


31.4 


1 Mean 

i 




100,0% 


100.0% 


100.0% 


100.0% 


100.0% 


Score 


3.59 


3.19 


3.88 


3.47 


3.55 
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Table 16 « Ethnic Group and Second SES Measure 



Scores on 
{number of 
home) 


Second SES Measure 
appliances in 




Ethnic Group 






Black 


Spanish 
B acko round 


White 


Other 


Sample 




0 


17.8% 


20.3% 


12.6% 


16.8% 


17.5% 




1 


26.7 


28.0 


17,8 


33 « 2 


25.5 




2 


25.5 


22.7 


24.0 


26.0 


24.7 




3 


18.4 


15.8 


21.5 


20.0 


18.4 




4 


8.7 


8,6 


14.1 


8.0 


9.5 




5 


3.0 


4.7 


9.9 


6.0 


4.5 : 






100.0% 


100.0% 


100.0% 


100.0% 


100.0% 


Mean Score 


1.82 


1.79 


2.36 


1.97 


1.90 
,J 
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Finally, the impact of SES on the previously discussed relationship between 
ethnic group and achievement level was t^xplored, to determine whether this 
relation was merely, or largely, a function of different SES levels in the 
different ethnic groups ♦ To explore this, an analysis of covariance of ethnic 
group on achievement level was done, using the two SES measures as covariates* 
The results of this analysis showed that, while SES had significant impact on 
achievement score, the reading and math scores of the different ethnic groups 
remained significantly different even after adjustment for the effect of SES 
level • These same results were found in all three grades, with patterns of 
differences being identical to those previously reported for unadjusted 
scores^ A typical result is shown in Table 17, indicating tiiat although the 
two SES measures are significantly related to total reading scores for 
fourth graders (as indicated by a significant F value for the regression 
slope) , there still exists a significant difference between ethnic groups on 
adjusted scores. 



Table 17. Analysis of Covariance for Fourth-Grade Total Reading Scores 



f — -" • - ■ 

1 

i Source of Variation 

1 


Degrees of 
Freedom 


Sums of 
Squares 


Mean 
Squares 


F-Value 


1 Ethnic Group (Adjusted) 

] Slope 

I 
\ 

1 Error 

i 

1. 


3 
2 

2412 


19441.0 
22343.3 
343626.6 


6480.3 
11171.7 
142.5 


45.49* 
78.42* 



*P<.01 
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V» DSVEIiQPMENT OF UNBIASED ESAA ACHIEVEMENT MEASURES 



A. RATIONALE AND METHODOLOGY 

While bias is an emotionally charged term, it has a straightforward technical 
meaning • A test is biased if it measures different things for identifiable 
sxibgroups in the population* Often these subgroups are defined along cultural 
and ethnic lines ♦ The fact that different groups attain different average 
scores does not of itself indicate bias« The bias occurs when the scores are 
used for comparing the groups in an inappropriate way. If, for exaxt^ple, a 
test purporting to measure reading coinprehension is administered to tvo groups, 
and for one group, perhaps because of their cultural background, the test is 
more of a vocabulary measure, then the tolas occurs when one tries to compare 
members of the two groi^ps on the reading comprehension dimension. The test as 
a measure of reading comprehension is biased against the group for which it 
is primarily a vocabulary measure. However, if the test is used for comparing 
individuals within the second group on the vocabulary content, then the test 
is not biased, since it measures the same content for all members of this group 
and is used appropriately. 

It is not sujcprising that many investigators have found evidence of bias 
against cultural and ethnic minorities in popular aptitude and achievement 
measures. These measures have been typically developed by and for "middle 
America** and reflect the content that is thought appropriate to this group. If 
these measures favor any group, they favor White middle*>class students. To the 
extent that definable subgroups share life experience with that groi:^, the 
test is ^propriate. To the extent that the content reflects factors that 
are unique to a particular culture or have culturally specific meanings, the 
test is biased in favor of members of that culture and against members of 
other cultures. V^^illiams (1974), for exan\plef has developed reading passages 
that bias reading comprehension tests in favor of Black students, reversing 
the more typical bias in favor of Ifhite middle-class students. 
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A nuinbar of methods for investigating test bias have been suggested 
(Cardoll and Coffraau, l^K>4j Cloary and Hilton, 1968| Shc^ron and Angoff , 1973i 
Green and Draper, 1972) • In the absence of an external validity criterion* # 
all such methods are concerned with the detection of items that show differen- 
tial characteristics m defined subg roups • Two of these approaches have been 
used extensively • one approach investigates the items that contribute to the 
groux^-by-itoxn interaction within an analysis-of-variance framework • The other 
focuses on methods for maxim^ing certain psychometric properties of subsets 
of items through item analysis techniques # and then comparing the resultant 
subsets. Items that are good discriminators in one subgroup but not in others 
are identified as biased against those other groups. 

While considerable research effort has been expended in developing these 
techniques f the research reported in the literature is directed more toward 
the statistical methodology tlian toward the applied problem of identifying bias 
in a measure and taking some corrective action. For purposes of the ESAA 
evaluation, it was deemed necessary not only to identify the possibly biased 
items, but to remove them from the measures, thereby deriving appropriate 
and maximally sensitive measures of achievement. It should be noted that by 
beginning with measures designed for White middle**class students and then 
removing items that demonstrate possible bias against the ESAA subgroup, one 
defines a measure representing the educational content and experience common to 
both groups. 

The following sections describe the steps undertaken to identify possibly 
biased items. Briefly, two phases were used in the bias analysis* The first 
phase, a statistical analysis, identified items with statistical characteristics 
indicating that the items might be biased. However, since such characteristics 
could have resulted from rioidom sampling fluctuations, a second phase investi^ 
gated the item content more intensely. This second phase, the content analysis. 



*h common method of validating a test is to con^are performance on the 
test with some independent criterion external to the test. If the test 
is correlated with the other criterion then the test is said to be a 
valid measure* 
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focused on cletennining whether each Item had a content or form that might bias 
it against students in minority«*isoXated schools* The basis for content 
analysis decisions vas the consensus of minor ity«»group experts on measurement 
theory* testing « subject matter content* compensatory education* and problems 
of the disadvantaged student « ^ 

It must be noted that these procedures do not guarantee the identification of 
truly biased items # nor do they insture that any item identified is truly biased* 
Rather, these procedures identify items for which the probability that the item 
is biased is significantly greater than zero* That is» for the items identified* 
the preponderance of evidence available suggests a bias* It is therefore 
necessary to remove those items from the scoring when an unbiased measure is 
desired* Their removal lessens the chance of including biased items* The 
procedures do not rule out the possibility that biased items are still included* 
and they do not in any sense "prove'* that the items removed are truly biased* 

B* STATISTICAL ANALYSIS 

The basic data for the statistical analysis came from item-characteristic 
indices derived from the test publisher's national standardisation sample and 
from the special adniinistraticn to the sample of students enrolled in ESAA- 
eligible minority-isolated schools. <See CT3*s Bulletin of Technical Data 
for descriptions of samples and sampling techniques for publishers* standard** 
ization.) The logic of tl^ese analyses followed the logic used in other item 
bias investigations in the absence of an external validity criterion* Here, 
however* less concern was given to determining statistical signifiance as a 
criterion for item bias than was placed on identifying "suspicious" items for 
further study in the content analysis phase. For this reason the st^.tistical 
procedures were modified somewhat* in order to give a better picture of each 
item in relation to the other items regardless of level of significance. 

In the analysis--of-variance framework for investigating bias* researchers 
have analyzed the data in a two-dimensional model that considers items as one 
dimension and group membership <e«g.* ethnic group* SES level* etc.) as the 
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other dimension • Within this framuwork th^ investigator attempts to detertnine 
vhich items contribute to the i tom-by'-group interaction} that ijst vhich items 
demonstrate a difficulty levtsl for a certain group that cannot be reasonably 
accounted for by the overall level of the item, the level of the group, or the 
general difficulty level of the tost. The first procedur^e used here provides 
information directly indicating which items contributed to the item**by-group 
interaction, without perfonning the statistical tests. Each of the four s\:^** 
tests was first ordered by the difficulty values derived from the publisher's 
standardization sample* An item-by~dif faculty plot was prepared. Data from the 
£SAA-eligible sample were then entered on the same plot« If no interaction was 
present, the resulting curves would be similar in shape but might have differing 
heights depending on the overall achievement levels. Items contributing to the 
item-by-group interaction appeared as disturbances in the uniformity of the 
curves • Similar graphs were prepared for several ethnic subgroups within the 
EShh sample. 

As an example, a hypothetical case is shown in Figure 1 below where items 2 
and 5 show different characteristics in the two student groups (A and B) • 
Because of such marked differences, these items are highly suspicious. The 
difference in the overall height of the curves indicates that the test is 
generally more difficult for Group B. It is of interest to note that in this 
example item 2 is biased against members of group B, while item 5 is biased in 
favor of group B, 




Item No, 3 1 7 2 6 10 4 9 5 8 12 11 



Figure 1, Example of Graphical Detection of Itom-by-^roup Interaction 
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I^hd second a|>proach to the statistical analysis of item bias used infonnation 
normally used for test construction « V7hen designing an achievement measure, 
the test constructor wants to achieve two general objectives • First* he wishes 
to measure one particular content? e^g., a reading vocabulary test should 
measure reading vocabulary knowledge and not mathematical computation skills* 
Secondf the measure should discriminate levels of knowledge of the examinees in 
relation to the contents One method of analyzing a set of items under 
consideration for inclusion in a test is to exsonine the point-biserial correla- 
tion coefficients between the score on each item and the total score on the 
set of items* item will exhibit a lower correlation if it measures a 
content unrelated to the content measured by most of the other items present* 
Thus* this technique can be used to look for biased items. It should be noted 
that low point-biserial correlations can also result from other psychometric 
properties of the item, but that in published tests one can assume that items 
with low correlations for other reasons have already been removed from the 
item pool. 

The second procedure, then, was based on the discriminability quality of each 
item in the s\:±>test. h low point-biserial correlation coefficient for an item 
did not contribute to tot<il scores for the s\ibtest and indicated that the item 
might be measuring a different content from that measured by other items in the 
subtest. By comparing the point-biserial values for the defined groups, it was 
possible to identify items that did not contribute to total scores for certain 
subgroups. Again the ESAA-eligible minority-isolated sample as a whole was 
compared to the standardization sample, and comparisons were made among sub- 
groups within the £SAA-eligible sample. 

Using the above procedures, the restandardization data were analyzed. 
In order to identify "suspicious" items for more intensive analysis, 
statistical significance criteria were abandoned in favor of a more subjective 
review of the statistical results* Three professional staff members with 
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considerable exp^rienco and training in statistical analysis and psychometric 
theory jointly reviewed these results and identified the items which in their 
collective opinion indicated aberrant properties* 

Two kinds of patterns were typically noted^ Some items had different 
characteristics (item difficulty or discriminability) for the ESAA-eligiLle 
minority-isolated sample in general but not necessarily among groups within 
that sample* For example, in Table 18, the point-biserial correlation 
coefficients for Item 1 in the Level 3 Vocabulary subtest for the three main 
groups within the ESAA sample are lower tlian the corresponding coefficient for 
the CTB standardization sample, but there are no large differences between groups 
within the restandardization sample. {Grade 5 is reported here, but a similar 
pattern exists for Grade 4,} 

n-hhi^iy ii-iatn*5 QhnwAri r-nnc^i i^i»v*5»hl xrAvi ^^^hi 1 i tA/ ^mnna <:nhrfrniinK WTlrhTn t:h« 

restandardization sample* The second row of Table 18 illustrates this case, 
using Item 6 of the Level 3 Compreiaension subtest (again using fifth-grade data) • 
Here the item is a good discriminator for Whites within the restandardization 
sample, but the discriminability falls off for the other two major subgroups. 



Table 18. Examples of Aberrant Item Characteristics 



Case 
(Example Items) 


1 

j ESiUV-Eligible Sample 
Black Spanish White 


CTB Sample 


1. Item 1# Level 3 
Vocabulary 

2« Item 6, Level 3 
Comprehens ion 


-"•«■''■■ """" ■'■ 

\ 

i .29 .31 .27 

1 

1 

! •SS .34 .56 


.50 
,56 
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In all, 5C items wero identified as suspicious « eight from Level 2 and 48 
from Levui Tablo 19 indicates the number of suspicious items from each Level 
and subtc;3t* Clearly the areas of reading skills tend to be the most heavily 
laden with suspicious items# The mathematics items identified at Level 3 may be 
the result of the extreme difficulty values of many of these items for students 
within the restandardiaation san^les« 



Table 19 « Suirmiary of iiuxober of Items Identified as Suspicious, by Level and 

SiO;>test 



Subtest 


Level 2 


Level 3 












Total Number 


Possible 


Total liumber 


Possible 




of Items 


Bias 


of Items 


Bias 


Reading 










Vocabulary 


40 


5 


40 


10 


Comprehension 


45 


3 


42 


19 


Mathematics 










Concepts 


30 


0 


25 


7 


Computation 


72 


0 


48 


12 



The items identified as statistically suspicious and su}:anitted to content review 
are listed in Table 20, along vith the reasons for such identification. The 
categories indicate the kinds of data that were considered the primary reasons 
for flagging a particular item. Extreme difficulty, for example, accounts for 
a certain subset of the items. Discriminability in both of the senses described 
above and contribution to the item-by-group interaction are the other categories 
noted. It is instructive to note that both the first and third columns are 
related to difficulty. In the first column^ items are either too hard or too 
easy for a particular group or groups. 
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Table 20. Items Identified as Potentially Biased and Reasons for Identification 
(Sheet 1) 



Level 2, Grade 3 



Stibscale 


Item # 


Difficulty 


Discrim- 
inability 


Interaction 


Vocabulary 


1 


X 








2 


X 








38 




X 






39 




X 






40 




X 




Coii\prehension 


2 
3 
7 

i.,„ ■■■,1.. 1 


X 


X 


X 



Level 3, Grades 4 & 5 









Discrim- 




Subscale 


Item # 


Difficulty 


inability 


Interaction 


Vocabulary 


1 




X 






12 






X 




16 






X 




17 










23 










29 




X 






32 




X 






38 


X 








39 


X 








40 


X 

■ 
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Table 20, It«ms Identified as Potentially Biased and Reasons for Identification 
(Sheet 2} 



Level 3# Grades 4 s 5 {continued) 



Subscale 


Item # 


Difficulty 


Discrim- 
inability 


Interaction 


Comprehension 


6 




X 






11 




X 






12 




X 






13 






X 




15 






X 




19 




X 






20 






X 




21 


X 








22 


X 








23 




X 






28 




X 






29 






X 




30 




X 






31 


X 








33 




X 






34 




X 






40 




X 






41 


X 








42 


X 






Math 










Concepts 


11 




X 






19 


X 








21 


X 








22 


X 








23 


X 








24 










25 


X 
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Table 20 • Items Idontitied as Potentially Biased and Reasons for Identification 
(Sheet 3) 



Level 3, Grades 4 & 5 (continued) 





Item # 


Difficulty 


Discrim- 
iaability 


Interaction 


Math 










Computation 


2 






X 




20 


X 








39 


X 








40 




X 






41 




X 






42 




X 






43 






X 




44 


X 








45 


X 








46 


X 








47 


X 








48 


X 

1,. 
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CONSENT ANALYSIS 

Tim it&cis identified by the above procedures were suspected of bias because of 
their differing statistical properties in different subgroups. However* the 
statistical analysis could not indicate exactly what property of an item made 
it biased* In fact* the statistical analysis could have identified some it^s 
that were not biased but that had aberrant characteristics because of chance 
factors* Since the goal for the ESAA Evaluation was to develop measures that 
would not contain bias against any of the subgroups involved in the study, it 
was considered necessary to identify the source of potential bias in an item 
before removing it* 

The review of suspicious items was conducted to determine whether each item had 
a content or form that could bias it against one or more of the ESAA si^roups* 
Ift for exaxr^plei a reading comprehension item contained item alternatives requir- 
ing knowledge of a particular culture, this item would be considered potentially 
biased against students from other cultures. Such an item may not measure reading 
comprehension for members of other cultures, and hence could be biased according 
to our earlier definition* 

In order to represent the subgroups involved in the ESAA sample and to include 
several perspectives on the irsue of test bias, the review panel was designed to 
reflect both the ethnic/cultural structure of the ESAA sample and various points 
of view on measurement issues* Eleven panelists were selected, representing 
different parts of the country, ethnic s\ibgroups, and substantive points of view* 
The panel included three Southern Blacks, two metropolitan Blacks, one Northeastern 
Puerto Rican, three Southwestern Mexican Americans, one American Indian, and one 
Asian American* Their specific backgrounds were diverse; several were experienced 
teachers, two were item construction and test development specialists, and one 
each was a principal, a superintendent, and a community leader. 

The 11 members of the review panel convened to examine independently the content 
of the potentially biased test items and to rate them as biased or not biased* To 
accomplish this goal, the following procedures were used: 



ERLC 



71 

57 



Explanation of Objectives and Procedures 

A description of the restandardis:ation study and its objective of establishing 
scales that vould more appropriately reflect £SM student achievement gains was 
presented* It was emphasized that the statistical analysis of the responses to 
the CAT items had identified a number of items that had suspicious statistical 
characteristics for certain groups* An explanation and exai^ples were provided 
to the panelists to illustrate the important distinction between a *^diff icult** 
item and a '^biased" item, A "biased" item was defined as an item having an unusual 
difficulty level or correlation with total scores for a certain minority groupt 
or gro\;)pSf because of cultural or socioeconomic considerations* 

2* Initial Rating Procedure 

Of the 56 items examined by the review panel, 48 were taken from Level 3 of the 
CAT {administered to fourth- and f ifth-^grade students) and eight ware taken from 
Level 2 Ca<Ministered to third-grade students) • Each group of items was preceded 
by instructions from the administrator's manual and the examples that were in the 
test booklet. The participants were instructed to rate each item as "unbiased," 
"slightly biased," or "more than slightly biased." This breakdown had the advan- 
tage of eliciting slight but important ratings of bias for items that the respon- 
dent might otherwise categorize as unbiased* If an item was rated as **slightly 
biased" or "more than slightly biased/^ the rater was asked to write a specific 
reason why the item was seen as biased. All ratings and comments were done inde- 
pendently and in writing, in order to ensure complete candidness and in order to 
remove possible influences of stronger personality types or status • The partici- 
pants were encouraged to present their objections at a level of specificity that 
ail other participants could read ^nd understand. They were also asked to refrain 
from discussing all items with the other participants until the end of the day. 
After all reviewers had rated the items, the results were tabulated by tl;e research 
staff • 

3* Initial Scoring of It e ms 

The purpose of the content analysis was to arrive at a consensus of the reviewers 
on whether each item was biased, and if so, why. The tabulation, therefore, 
looked for consensus both across reviewers and among reviewers within ethnic 
subgroups. If there was consensus that a particular item was either "not biased" 
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or **biased/' tliat item was removed from further review. Items for which there 
was disagreement were used in the second round of review* 

The two biased categories were treated the same* However f the various categories 
of respondents were weighted, s nee the nuxnber of respondents representing a par** 
ticular minority was roughly proportionate to the ESAA sample* The following 
criteria were adopted for determining the status of each item; 

• There would be consensus that an item was biased if at least 
50% of all the respondents rated it as biased* 

• There would be consensus that an item was not biased if at least 
50% of the respondents in each minority categ or y rated it as not 
biased* 

• There would be no consensus on an item if at least 50% of the 
respondents in any cateqory rated it as biased bu t less than 
50% of all respondents rated it as biased. This item would be 
presented to the participants for a second rating^ with its 
compiled list of objections. 

4* Se c p j ad Rating of the Items 

After the tallies of responses from the first round were complete, the resultant 
"no consensus" items and their respective lists of objections were submitted to 
the panel members for review. The participants were asked to read carefully the 
list of objections to each item and then to rate the item a second time. The 
order of presentation of the items was randomized, so that the participants 
could remain in a group situation without influencing each other through 
expressions or remarks. 

If the arguments attending an item were valid, then reviewers who had previously 
rated the item as "not biased" were expected to shift their ratings toward a 
consensus position. The criterion for consensus in the first rating session was 
intentionally conservative in order to provide participants a second opportunity 
and additional information with which to rate marginal items* In the event that 



ERLC 



76 



59 



no shift occurred, consensus within the subgroups or groups that had originally 
^dcfitified the item as possibly biased was used as the criterion for retention 
or deletion^ since some arguments against an item might be so culturally or 
ethnically specific as not to have relevance for members of other subgroups* 

D* RESULTS 

At the end of the second round, a complete list of the items identified as 
possibly biased and the reasons for such identification was coxtrpiled. Of the 
original 56 '^suspicious'* items, 16 items were agreed to be biased— -3 from Iievel 2 
and 13 from Level 3« All of these items were reading items* None of the math 
items were considered to be biased. The items identified as biased^ with the 
reasons given, are indicated in Appendix D« 

The reasons listed for items being possibly biased were diverse and perceptive. 

when a child's expuvience is confined to an inner*"city ghetto, to a particular 
region of the country, or to a city or rural area exclusively, he learns very 
little outside his own community or environment* His unfamiliarity with certain 
objects, concepts, or words described or used in the test prevent him from being 
able to answer the items correctly. His knowledge could be more accurately 
measured by using topics and words with which he is familiar* 

In some of the languages represented by the various ethnic subgroups, different 
meanings, connotations, and implications were introduced when an English word 
used as an item response alternative was translated into their own language. 
Also, iraproper associations resulted when there was not sufficient knowledge of 
the double meanings of many English words. 

From the results of the content analysis, the final ESAA scales were then deter- 
mined by including only those items which were not identified as possibly biased 
by either statistical or content review procedures. The psychometric properties 
of the resultant fcales are summarized in Tables 21 and 22. Table 21 presents 
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Table 22« Test Score Means for Full and Derived Scales for Two Reference Groups 





tShh 


CTB 


Grade 3 






Vocabulary 
Debiased Vocabulary 
Difference 


28.02 
26.88 
1.14 


30.93 
29.46 
1.47 


Grade 4 






vocabulary 
Debiased Vocabulary 
Difference 


14.28 
12.39 
1.89 


19.70 
16.73 
2.97 


Comprehension 
Debiased Comprehension 
Difference 


14.71 
13.40 
1.31 


20.27 
18.52 
1.75 


Grade ^5 






Vocabulary 
Debiased Vocabulary 
Difference 


18.29 
15.94 
2.35 


24.69 
20.71 
3.98 


Coi^prehension 
Debiased Comprehension 
Difference 


18.25 
16.70 
1.55 


24.43 
22.20 
2.23 
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the reliabili^ of the full and derived Pleasures for each of the scales affected 
by removing items identified as biased. As the values in parentheses indicate, 
the reliabilities of the derived scales* when adjusted for test length by the 
Speannan->£rown formula** are just as high as the original scale. Of greater 
ii^portance are the means for the £SM and CTB samples* reported for the affected 
subscales in Table 22, Here one notes that the removal of the items identified 
as possibly biased has a significantly smaller effect on total score for ESAA- 
eligible students than for l^e publisher's standardisation sample* This indicates 
that the items removed do in fact have a greater contribution to total scores 
for the publisher's sainple than for the ESAA-eligible sample* and that these 
represent little more than measurement noise for the £SM<-eligible sample. 



*The Spearman-Brown prophecy formula, which can be used to estimate the effect 
of an increase in tesst length on reliability, assumes that the items added to the 
test are similar to the initial items in difficulty, intercorrelations, and 
content. Since reliability is, in part, a function of test length, these esti- 
mates are useful for comparing the original and derived scales. 
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VI^ PRgPARATIQN OF HORMS 

Individual achievement scores describe the muiiber of items that the examinee 
answered correctly • unless the scosnes have an inherent underlying scale # such 
as scores on a true Guttman scale « it is difficult to use these raw scores in 
a meaningful way« Appropriate guides are needed for the interpretation of 
these scores* The present section addresses the problem of providing a frame- 
vork for interpreting the raw scores obtained from £SAft«-eligible students^ 

The first step in this process is the definition of an appropriate scale struc* 
ture* In order to make present results maxiUnally compatible with the normative 
interpretation of scores provided by the publisher's national norms tables # 
it was necessary to define similar kinds of norms* Percentile ranks of individ** 
ual raw scores were decided upcm as an appropriate interpretive scale with 
maxxmum compatibility with the national norms « Percentile rank conversions or 
a single raw score to the alternative comparison distribution allow quick 
assessment of performance relative to these two grotqps without further trans** 
formations* 

Examination of the raw score distributions indicated that all distributions 
were fairly regular in shape. They were typified by a degree of skewness* 
Because of the wide variability and skewnesst it was decided that the smoothed 
raw score distributions (instead of normalized scores) would be used for deter«* 
mining the conversions. 

Separate conversion tables were prepared for individual student scores and school 
mean scores. Both were derived in the same manner. The cumulative score dis«» 
tribution was constructed from the data (individual student scores or school 
means} . These distributions were then smoothed to minimize the effect of local 
irregularities. Because of the extreme regularity of the data, a rolling weighted 
average procedure developed by Cureton and Tukey (1957} was "sn^loyed. Working 
from the smoothed curve t new percentile rank values were read from the curve at 
the mid-point of each score interval. 
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J^ppendix £ contains the tables for converting raw scores to percentile rcmks 
for each of the subtests and totals at each of the three grade levels^ for 
both Individuals and schools « For any raw score value # these tables give the 
percentage of cases falling below that score « In terms of their use in the 
EShh Evaluation, they indicate the approximate percentage of students Cor 
schools) from the ESAA'-eliti : ble population who have obtained a lower score on 
that subset of items seen as appropriate for this population, when measured at 
the end of their respective grade levels. 
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Vll* SUMMARY 

document has described a series of activities designed to meet several 
px« ^valuation goals of the national £SM Evaluation study « These goals included 
the selection of an appropriate achievement measure ^ a pretest of this achieve*- 
ment measure to assess the needs of students in schools that would be eligible 
io receive funds under the Act, an important resear-ch effort directed toward 
possible ethnic and/or cultural bias in the achievement measures, and the 
establishment of achievement test norms to aid in the interpretation of student 
and school performance relative to the appropriate subpopulation. 

Toward the achievement of these goals a substantial test review and selection 
activity was undertaken* This activity resulted in the selection of the specific 
subtests of the California Achievement Test battery that were seen as best 
measuring tlie outcomes stated as objectives in tlie Emergency School Aid Act* 
These measures were selected on the basis of several criteria including appro- 
priate content, good pychometric quality, administrative ease, and clearly 
defined national norms. 

A nationally representative samx^le of students in ESAA-eligible minority-isolated 
schools was selected to be tested. A standardized administration of the achieve- 
ment measure, as well as a questionnaire describing students' backgrounds, 
yielded the data that were analyzed and reported upon in the present document. 
These data were used in achieving the remaining goals. 

Descriptive analysis yielded important baseline data for the evaluation study and 
firmly established the basic educational needs of students in eligible schools. 
This highly concentrated minority siabpopulation demonstrates achievement loveli» 
significantly below those expected for their grade level. While both reading and 
mathematics achievement are dex^ressed, mathemati^^s i^erformance is slightly better. 
Results indicate that minority st-udents in minority-isolated schools demonstrate 
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lower periormance than their white schooIiaateSf even though the latter are 
themselves significantly below their expected performance levels • 

l^dditional descriptive analyses provided information on the characteristics 
of the sample « These data^ particularly ethnic self*- identification and socio^ 
economic indices, are seen as crucial for esl:ablishing the comparability of 
future study groups to this reference gro\:\p* Such comparability is the essence 
of the validity of such references. 

Investigation of potential ethnic or cultural bias in item content and form 
through statistical analysis and content review yielded a small subset of items 
for which there was some evidence indicating possible bias* It is suggested 
that for certain uses of the achievement measures « scores should be calculated 
without these items. Such uses would include any situation where one compares 
groups of students {schools t programs* etc.) that are composed of significantly 
different proportions of students from different ethnic groups or different SES 
levels « 

h final product of this research is the establishment of subpopulation norms 
based on end-of-year performance of a representative sample of students enrolled 
in ESAA-eligible minority-isolated schools. Such norms may be of significant 
use to the local evaluator in assessing the performance of an individual student 
or school relative to this reference group. Reference norms are provided both 
for complete sxabtests and for subtests excluding those items identified as 
possibly biased, and are structured for use with either individual student scores 
or schoul mean scores. 

The research reported provides important baseline and supportive data for 
assessing the adequacy of the measures to be used in the national ESAA Evaluation 
study. As a reult of this research, a clear academic need has been established 
for students in the defined population end appropriate references have been 
derived for evaluation purposes. 
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APPENDIX A 

LETTERS TO SUPERINTENDENTS OF THE DISTRICTS, 
CHIEF SCHOOL OFFICERS, AND COMMISSIONERS OF THE HEW REGIONS 
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1. ^ 



DEPARTMENT OF HEALTH, EDUCATION. AND WELFARE 

OFF-JCE OF EDUCATION 
WASHINGTON. D.C. 2D202 



Dear Superintendent: 

As you probably know, the Emergency School Aid Act (Title VII of Public 
Law 92-318} provides for grants to local educational agencies (1) to meet 
special needs incident to the elimination of minority group segregation 
and discrimination; (2) to encourage the voluntary elimination, reduction, 
or prevention of minority group isolation; and (3) to aid school children 
in overcoming the educational disadvantages of minority group isolation. 

The U.S. Office of Education is charged with responsibility for evaluating 
the impact of these grants. In this connection, we will be arranging for 
a series of special achievement tests in school districts receiving awards. 
These tests will begin in September, 1973. If your district applies for 
and receives an award, you may be contacted at a later date regarding the 
September, 1973 testing. 

This letter* 8 purpose is to ask your cooperation in a limited norms testing 
effort scheduled for May of this year. Minority norms for standardized tests 
do not exist. We believe that it is Important in measuring the achievement 
of minority group children Co do so against norms established for these 
children then&elves , as well as against norms established for che nation as 
a whole. For this reason, we have drawn a nationally representative sample 
of 100 minority group isolated schools located In districts which, like yours, 
meet at least one of the eligibility criteria for ESAA awards. Ve have 
arranged for an independent agency, the American College Testing Program, to 
administer standardized reading and mathematics tests to approximately 30 
third graders, 30 fourth graders, and 30 fifth graders in each of these 100 
schools, at a time in Ma> to be determined by mutual agreement. Test results 
will be analyzed to obtain pre-award norms for students in minority group 
isolated schools. 
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Ona or more schools in your district wer^e randomly selected^ from the 
population of minority isolated schools ^ for inclusion in a national 
sample. Schools selected in your district are listed at the end of this 
letter. 



We request your permission to have American College Testing Program re- 
presentatives administer tests to approximately 90 students in each school 
listed. The tests will require only a few hours of your students^ time. 
The American College Testing Program will conduct the tests and will supply 
all needed materials. None of your teachers or other staff members will be 
required to assist during test administration, unless you would prefer to^ 
have them present. 

Test results for individual students and schools will be completely confi- 
dential. A report of norms ^ siunmarizing national results for the entire 
group tested, will be published, but it will not contain any identification 
of participating schools or districts. We will provide you with copies of 
this report as soon as it is available. 

He are well aware of the fact that the norms testing activity will result in 
some disruption in your students^ scheduling. We ask your cooperation only 
because we believe that this activity is of considerable importance. We have 
made every effort to keep our sample small and to reduce the imposition on 
schools to a bare minimum. In return, we will be able to establish, for the 
first time, a set of norms for students in minority group isolated schools. 
These norms will be a vaj^uable tool, not just to us in carrying out our 
responsibilities for national evaluations, but also to you and to all other 
educators who are working with minority group children. 

We are most anxious to have your cooperation in this effort. Could you 
appoint a member of your staff who can discuss details with us? It would 
help considerably if you could telephone us as soon as possible , naming such 
a point of contact. Your phone call should be to Dr. Michael J. Wargo of my 
office, at (202) 963-4613. 

Correspondence regarding the May, 1973, norms testing activities should be 
directed to Dr. Michael J. Wargo, Office of Planning Budgeting and Evaluation, 
Room 4079, U.S. Office of Education, Washington, D.C. 20202. If you wish 
information on ESAA Grants, however, please direct inquiries to the U.S. 
Regional Education Office whose address is given at the end of this letter. 
The participation of your district in these testing activities has no bearing 
on ESAA grant procedures or decisions. 



Sincerely yours. 



Jcfhp W. Evans 

A^/istant Commissioner for 
Program Planning and Evaluation 
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Mstrlct Kame: 
State: 

Address o£ Regional Office for information on £SAA Grants: 



School(s) selected in your district:* 



If a selected school is missing one or more grades from the three 
we will be testing (3rd, Ath, and 5th grades), or if it is wholly 
or partially ungraded at these levels, we will need to use special 
procedures to select students for testing. We will aiscuss these 
with you by telephone. 
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LETT£R 70 CHIEF STATB SCHCX}L OFFXCEKS 



DEPARTMENT OF HEALTH, EDUCATION. AND WELFARE 

OFFICE or EDUCATION 
WASHtNGTON. Q.C 20202 



I3ie enclosures to this letter are provided to let you know of testing 
activities we are arranging to carry out In your State, In connection 
with studies we will be making of Emergency School Aid Act (ESAA) Pro- 
jects. I am sorry that we have not been able to give you earlier notice 
of our plans. As you may know, there have been delays in publishing the 
final versions of the SSAA regulations. These delays have made It necessary 
for us to omit a number of our originally planned announcement and coordination 
activities. 

In the near future, we will send you a more coo^plete description of our ESAA 
evaluations. In the mean time, if you have any questions, please call Or. 
Michael J. Wargo in my office, phone (202) 963-4613. 

Again, I very much regret the delays in our schedule which have prevented 
earlier notification. X appreciate your consideration and assistance. 



Sincerely yours. 




Enclosures: 



State Districts and Schools Selected 
Letter to Superintendents 



cc: Coordinator of State Committee on Evaluation 
and Information Systems 
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DEPARTMENT OF HEALTH, EDUCATION. AND WELFARE 



IiETTBE TO REGIONMi COMMISSXOIIEKS 



OI^FlCE OF EDUCATION 



WASHINGTON. O.C. 20202 



The enclosutes to this letter are provided to let you know of testing 
activities we are arranging to carry out in your region, in connection 
with studies we will be making of Emergency School Aid Act (ESAA) Projects* 
Similar material is also being sent to Chief State School Officers and to 
Coordinators of Committees on Evaluation and Information Systems in States 
affected by these activities. 

I am sorry that we have not been able to give you earlier notice of our 
plans. As you may know, there have been delays in publishing the final 
versions of the ESAA regulations. These delays have made it necessary for 
us to omit a number of our originally planned announcement and coordination 
activities. 

In the near future i we will send you a more complete description of our ESAA 
evaluations. In the mean time, if you have any questions, please call 
Dr* Michael J. Wargo in uy office, phone (202) 963-4613t 

Again, I very much regret the delays in our schedule which have prevented 
earlier notification. I appreciate your consideration and assistance* 



Sincerely yours. 



John w» Evans 

AssilsLant Commissioner for 
Program Planning and Evaluation 




Enclosures: 
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RESPONSES TO STUDENT BACKGROUND QUESTIONNAIRE 
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Form cleared by U^S* Office 
of Management and Budget: 
0MB No* 

Approved through July 31 1 1973 District 

Date _ 
Reference 

Nuiriber ^_ 

STUDENT BIVCKGROUND QUESTIONNAIRE 

For each question, put an "X'* in the box or boxes next to the statements that 
apply to you* 

DO NOT i^SWER A QUESTION UNTIL IT HAS BEEN READ ALOUD AND EXPIAINED, 

ALL GRADES 



1, How long have you been going to this school? 

l'" ' I (A) Since preschool or kindergarten. 27% 

{B) Since first grade. 24% 

I " I (C) Since second grade. 11% 
^ J (D) Since third grade. 14% 

II (E) Since fourth grade. 8% 
I I <P) I am new this year. 15% 

2* How inany different schools have you gone to since kindergarten? 

I I (A) Only one school. 42% 

I I (B) Two schools. 33% 

(C) Three schools. 15% 

1 ^ I (D) Four schools. 5% 

{E) More than four different schools. 4% 
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ALL GRADES 

3« How many pt^ople livt^ with you in your home besides yourself? 

I " ] (A) Only one other person • 2% 
l ^^ ^j other people • 7% 

[ I (C) Three or four other people* 29% 

VO) Five to six other people. 32% 

I^^J (£) Seven or tnore other people. 30% 

4. Did you do any reading at home during the past two weeks that 
was not school work? 

(A) Yes. 75% 

' (B) Mo, 25% 



5. Did you do any school work at home during the past two weeks? 
J (A) Yes. 67% 

( \ (3) No. 32% 

6, Is there anyone in your home that can help you with your school 
work? 

[ I (A) Yes* (ANSWER Q. 7.) 93% 
Q<B) No. (SKIP Q. 7 AND GO TO Q. 8.) 7% 



7, (IF YES TO Q.6) Did you receive any help with your school work 
at home during the past two weeks? 



(A) Yes, 53% 
I 1 (B) No, 41% 
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ALL GRADES 



8* Do the people in your home usually speak another language 



besides English? 

I I (A) No, they usually speak English* 74% 

[^ <B) Yes, Spanish. 18% 

I" j (C) Yes, an American Indian language. 1% 

I j(D) Yes, Chinese • 6% 

1^ ](£) Yes, Japanese. 2% 

l^^(F) Yes, some other language. 4% 

9. Check the box that best describes yourself. 

(^1 (A) Black. (Negro). 63% 

<B) Oriental (Japanese, Chinese, etc.). 1*2% 

[^<C) American Indian. 3.4% 

(D) White 24% 

I I (E) Other (Eskimo, Hawaiian, etc, 6% 



10. Would you consider yourself of Spanish background (Mexican, 
Cuban, Puerto Rican, Latin American, etc.)? 

I I (A) Yes. 21% 

Qj(B) No. 78% 
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ALL GRADES 



11. Which of the following do you have in your home? 

1^ ' ] (A) Daily newspap6ir. 65% 

I I (8) Dictionary. 78% 

^ j <C) Encyclopedia or other reference books, 59% 

j j (D) Story books, 83% 

<E) Magazines. 70% 

I J (F) Record player. 89% 

j I (<3) Tape recorder or cassette player. 54% 

I ) (H) Color television. 56% 

(I) Typewriter. 46% 

j"^ I (J) Automatic dishwasher. 13% 

I I (K> Two or more cars or trucks that run, 47% 

I ) <L) Automatic clothes dryer, 31% 

I^J (M) A special place to study. 58% 

12. How many hours a day do you usually watch television? 

(A) Most days, I do not watch television at all, 7% 

ri (B) Most days, I watch television some, but less than 

one hour, 12% 

I I (C) Most days, I watch television one or two hours. 14% 

j I <D) Most days, I watch television two or three hours. 13% 

n 

(E) Most days, I watch television more than three hours, 50% 
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Breakdown of Item He^^p^ >r^es for Student Background Questionnaire by Crade tevel 



Item 


Response 


Grade 3 


Grade 4 


Grade 5 


X 


A 


32% 


26% 


22% 




15 






20% 








Xa% 






r\ 
U 




















p 


10% 




24% 




ti 


H /» 




"sift 

3 /% 




B 


32% 


33% 


32% 




C 


12% 


14% 


18% 








5% 


7% 




£ 


3% 


4% 


5% 






2% 


2% 


2% 




B 


7% 


7% 


7% 




C 


28% 


29% 


29% 




D 


32% 


32% 


32% 




£ 


30% 


31% 


30% 


4 


A 


70% 


76% 


78% 




B 


29% 


24% 


22% 


5 


h 


63% 


67% 


71% 




B 


36% 


32% 


28% 


6 


A 


92% 


94% 


92% 




B 


7% 


6% 


7% 


7 


h 


59% 


56% 


44% 




B 


35% 


39% 


48% 


8 


% answering yes 










A 


73% 


73% 


77% 




B 


19% 


20% 


16% 




C 


2% 


2% 


1% 




D 


1% 


1% 


0.4% 




E 


0.3% 


0.2% 


0.1% 




F 


4% 


4% 


4% 
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Breakdown of It^sm Responses for Student Background Questionnaire by Grade Level 
(continued) 



Item 


Response 


Grade 3 


Grade 4 


Grade 5 




A 


62% 


63% 


65% 




B 


1.6% 


0.9% 


0.9% 




C 


2.6% 


3% 


4.5% 




0 


25% 


24% 


23% 




£ 


6.6% 


6.9% 


4.3% 


10 


h 


21% 


20% 


20% 




B 


77% 


78% 


78% 


11 


% answering yes 










A 


62% 


64% 


69% 




B 


73% 


78% 


83% 




C 


53% 


60% 


64% 




D 


82% 


83% 


84% 




S 


68% 


71% 


73% 




P 


85% 


89% 


92% 




6 


50% 


54% 


58% 




U 


57% 


56% 


53% 




I 


46% 


46% 


47% 




J 


15% 


14% 


11% 




K 


48% 


46% 


48% 




L 


32% 


31% 


29% 




H 


64% 


55% 


54% 


12 


h 


9% 


7% 


5% 




B 


15% 


12% 


11% 




C 


13% 


13% 


15% 




D 


10% 


13% 


15% 




E 


46% 


52% 


52% 
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APPENDIX C 

BESrONSES TO STUDENT BACKGROUND QUESTIONNAIRE BY ETHNIC GROUP 
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Breakdown of Item Responses for Student Background by the Mofied Ethnic Categories 



All differences in response patterns between races across all grades are significant 
beyond the .01 level except for question #7 and part G <t&pe recorder) on #11, 













otner 










30% 


31% 








20% 


23% 


17% 








12% 


11% 


8% 




D 


13% 


17% 


14% 


16% 




S 


8% 


8% 


7% 


8% 




X 




24% 


16% 


20% 


2 


h 


46% 


32% 


41% 


42% 




B 


33% 


34% 


35% 


26% 




C 


14% 


21% 


14% 


18% 




D 


4% 


7% 


6% 


9% 




£ 


3% 


6% 


4% 


6% 


3 




2% 


1% 


2% 


3% 




S 


6% 


7% 


7% 


9% 




c 


25% 


44% 


29% 


27% 




D 


34% 


31% 


31% 


33% 




X> 




J. 






A 


A 


ICQ 


/o% 


/2% 


/ /% 












2o% 




a 
A 




/O* 








B 


34% 


24% 


34% 


29% 


6 


A 


95% 


94% 


88% 


94% 




B 


5% 


6% 


12% 


6% 


7 


A 


57% 


57% 


55% 


53% 




B 


43% 


43% 


45% 


47% 


8 


% answering yes 












A 


91% 


79% 


25% 


56% 




B 


3% 


12% 


68% 


19% 




C 


1.2% 


0.2% 


1.4% 


5.6% 




D 


0,3% 


0.2% 


0.4% 


8.4% 




£ 


0.1% 


0.1% 


0.3% 


2% 




F 


3.2% 


6.3% 


2.9% 


8% 
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Braakdown of Itarn Rosponses for Student Background by tho Mc^if led Bthnic Cati^gori^es 
t continued) 



•'""'■*"•■'" 1 

1 

Item 


Response 


Black 


White 


Spanish 


Other 


9 


A 


100% 


0% 


18% 


0% 




B 


0% 


0% 


2% 


26% 




C 


0% 


0% 


9% 


46% 




D 


0% 


100% 


46% 


0% 




£ 


0% 


0% 


25% 


28% 


10 


JV 


0% 


0% 


100% 


0% 




B 


100% 


100% 


0% 


100% 


11 




66% 


68% 


59% 


60% 




B 


79% 


84% 


69% 


80% 




C 


59% 


69% 


51% 


59% 




D 


85% 


88% 


75% 


80% 




£ 


70% 


79% 


65% 


6o% 




P 


91% 


87% 


84% 


83% 




G 


54% 


56% 


'50% 


56% 




H 


55% 


64% 


50% 


54% 




I 


46% 


51% 


41% 


51% 




J 


10% 


23% 


14% 


14% 




K 


44% 


61% 


46% 


44% 




L 


28% 


46% 


27% 


32% 




H 


59% 


59% 


49% 


67% 


12 


% answering yes 












A 


8% 


6% 


7% 


9% 




B 


13% 


11% 


12% 


13% 




C 


12% 


20% 


16% 


18% 




D 


12% 


17% 


14% 


14% 




£ 


55% 


46% 


51% 


47% 
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APPENDIX D 



ITEMS IDENTIFIED AS POTENTIALLY BIASED, WITH MBVIEWERS* COMMENTS 
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CAT UBVEL 2^ FORM A 

_aE:ADiNG VOCABULARY^ Each item consists of a short stimulus phrase, in which 
one word is printed in bold type, and four single-word response choices « 
The student is asked to mark the response choice which has the "best'* 
meaning for the word in bold type. 



ITEM 38* 

American Indian Teachers Disadvantaged inner-^city children understand the word 

"trip'* in a jargon of the drug culture that gives "trip" a different meaning* 

Metropolitan Black Administrator and Asian American Teacher: Inner-city poor 
students, regardless of etUinicity, take very few trips* Those that are 
taken are never referred to as journeys • 

Metropolitan Puerto Rican Teacher: Puerto Rican children would have little 

experience with trips or other terms referring to trips, such as "journey." 

Southwestern Mexican American Teacher: Mexican American children would have 
little experience with trips or other terms referring to trips* 

Southern Black Administrator: "Tractor," one of the dis tractors, would be 

completely xmfamiliar to inner-city kids and as such may be particularly 
attractive to inner-city respondents* 

ITEM 39 

Metropolitan Black Administrator, iMatropolitan Puerto Rican Teacher, and South** 

western Mexican American Teacher: Street signs in the ghetto are abbreviated, 
that is# "Central Ave*" A ghetto child relying on his experience may not 
be able to associate "Ave*" with "Avenue*" 

American Indian Teacher: Children living in rural areas would not be familiar 
with the term "Avenue." 

Southern Black Teachers "Avenue" and "arena" would have no meaning to a child 
from a southern town* 

Southwestern Mexican American Test Developer: The choices would have little 
irteaning for a rural child. 

Southern Black Test Developer: Rural Blacks will be unfamiliar with streets 
and avenues. 



* Items in the Level 2 test booklets are not numbered. Item nxambers here 
refer to the sequence number of the item within its subtest* 
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IT£M 40 

Metropolitan Puerto Rican Teacher: ^*Full*^ for Spanish American children is 
something filled^ not something added up« 

American Indian Teacher i Concepts for total might be 'Vhole** or '•all,'* but 

not '^full.** '*Pull'* for some Indian children only relates to objects that 
are associated with containers « 
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CAT LEVEL 3> FORM A 



HEADING VOCABULARY t Each item consists of a short stimulus phrase, in which 
one word is printed in bold type, and four single-word response choices. 
The student is asked to mark the response choice which has the "best" meaning 
for the word in bold type. 

ITEM 12 

Southern Black Administrator: ''Pluck" is not a coitmon vord in disadvantaged 
cultural groups « '^Pluck'^ is not part of vocabulary of Southern Blacks. 

Metropolitan Black Administrators Black students do not have the opportunity 
to pluck strings. 

Asian American Teachers Xn order to know what is meant t a child must have 

been exposed to c^xperiences in "plucking'^ the strings of a violin or other 
instrument. 

Southwestern Mexican American I'eacher: "Plucking of strings" is not part of 
experience of most Spanish American children. 

Metropolitan Black Teacher; Some children have experienced "pluck" meaning 
"to pull" as in "plucking the feathers" from a chicken. They would not 
realize that "pluck" could also mean "pick." 
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ITEM 16 

Southwestern Mexican American Test Developers Mexican American children may 
literally translate "offer" as "ofrecer" which does not necessarily mean 
"to present." 

Southern Black Administrators Many inner-city and rural Blacks use the term 
"give a gift" rather than "offer a gift." 

ITEM 17 

Southwestern Mexican American Teacher^ Southern Black Administrator, South- 
western Mexican American Community Leader: The word "ship" is biased in 
favor of certain regions. The whole concept of vibrating ship would be 
unfamiliar to many children. 

Metropolitan Black Administrators I4ost poor kids would not have had the 
opportunity to experience the sensation of vibrations. 

Asian American Teachers If the word "vibrating" were used with another noun 
such as "car" which is more common to children of any region and economic 
level I more children could discern the meaning of "vibrating*" 
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American Indian Teacheri some children have never seen a ship or even a large 
body ot waters 

Metropolitan Black Teacheri The distractor ^Vhirling** may be unfamiliar to 
disadvantaged children* 

Southwestern Mexican American Test Developers Vibrating'^ is a much more 
familiar word to higher socioecononic children than to disadvantaged 
children* 



ITEM 23 

Southwestern Mexican American Test Developer: Biased against rural children. 
"Pl^t*^ is more closely associated with flowers* Alsot Spanish American 
children are more familiar with tortilla factories* 

Puerto Rican Teacher: The word *^plant*^ would mean trees or flowers to a 
Spanish child and not a factory* 

American Indian Teacher: factories are not a familiar sight in some areas* 
Some children would think ''plant'* only applies to trees and flowers and 
would not associate it with factory* 

Asian American Teacher and Metropolitan Black Administrator: This question 
assumes that the child has been exposed to some concept of industry and 
he is familiar with the use of the word ''plant'^ in connection with industry « 
Rural children would not be aware of this usage* 



ITEM 29 

Southern Black Administrator, Southern Black Teacher, Metropolitan Black 
Administrator, Asian American Teacher, Southwestern Mexican American 
Teacher: Building a house that requires a plan is not a common experience 
of the poor. 

Southern Black Administrator: Design would be an unfamiliar concept of the 
rural child of low income. 

Mexican American Community Leader and American Indian Teacher: The distractors 
cause bias in the use of two similar words, "describe" and "plan." A plan 
is in part a description « 



ITEM 32 

Puerto Rican Teacher: Disadvantaged children would associate '^principal" with 
school principal • 

American Indian Teacher and Mexican American Teacher: The term "school" as one of 
the distractors would mislead disadvantaged children in that they only 
associate "principal" with principal of a school. 
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Asian American T^sachert Poor inner-^city children may correlate thte school and 
the principal as authority vith the '•law** and may be handicapped by their 
limited experience » 

Metropolitan Black Administrators The vord ^^school^^ biases the item for all 
Hids who are familiar with 'V^incipal of school 

Southern Black Administrator i ^*LaWt^^ for the children of deprived areas # has 
a different meaning from what it has for middle-class children* 

southwestern Mexican American Test Oeveloper^^ Disadvantaged children will 

associate *^principal*^ with school principal and with disciplinary action* 

ITEM 39 

Southwestern Mexican American Test Developer and Southwestern Mexico American 
Teacher t *^Deserted*^ is translated in Spanish as leaving or breaking"** a 
friendshiPf which is ^^discouraging*^ and ^^dismaying,^^ 

Metropolitan Slack Teachert The first three distractors are all fairly close 
in that you can be discouraged by all three « 

Asian American Teacher: '^Oiscourage" is related to '^deserted'* in highly transient 
population* 

ITSM 40 

Metropolitan Puerto Rican Teacher, Southwestern Mexican American Teacher, and 
Southwestern Mexican American Test Developer: Children are more familiar 
with roots of a plant than with roots of problems, 

Asian American Teacher: Problems may trigger emotional responses and may be 
linked with fear, particularly for children who have experienced 
discrimination or prejudice. 
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READIHG COMftWBHEWSIOy : Each item requires the student to read a passage and 
aaswer several questions measuring con$>rehension of the content of that passage. 
Items 11 and 13 refer to a passage describing the geograj^y and resources of 
Canada « items 19 and 21 refer to a passage describing the process of erosiont 
especially as demonstrated by rivers* Item 34 refers to a passage describing 
the characteristics and the stud^ of the chimpanzee in his natural habitat. 



ITEM 11 

Southwestern Mexican American Test Developers: Rural children and children 
who do not live close to oceans are disadvantaged in that they are not 
familiar with ''harbor," • rapids^" "ice bound." 

Southern Black Administrator and Mexico American Community Leader; The 

inner-city child has little understanding of the relationship of mining 
to natural resources. 

Metropolitan Black Administrator, Asian American Teacher, and Southwestern 
Mexican American Teacher: Most ghetto Kids have very little experience 
outside their own community. Their knowledge can be more accurately 
measured by using a topic with which they are familiar rather than mining 
and natural resources. 

American Indian Teacher: Mining is a term that is associated only with coal 
in certain parts of the southwest. Children with a second language will 
probably not comprehend or interpret the mining references in the 
paragraph. 

ITEM 13 

Southwestern Mexican American Teacher and Southwestern Mexican American Test 
Developer: "Land*locked" is an unfamiliar concept, and "unsettled" may 
be interpreted as "pioneer land" in the historical sense. 

American Indian Teacher: Children living in interior regions will have no 
concept of "land-locked" areas. 

Asian /^erican Teacher: Low socioeconomic groups or those who live in the 
desert or in densely populated areas would have no familiarity of 
relationships of land-use patterns to natural resources. 

Metropolitan Black Administrator: Southerners and other kids with little 

familiarity of Canada would be unable to relate to the entire question. 
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ITEM id 



Southwestern Mexican American Test Developers; For some children ^^course^* is 
more closely associated vith class or golf course than with area. 

Metropolitan Black Administrator: Urban kids, especially those from Southern 
California t have little e:^erience of this kind« 

American Indian Teacher: Some areas of the country are not familiar with 
rivers } they are more familiar with streams. 

Mexican American Teacher: Biased against parts of the country where rivers 
are an unknown phenomenon. 

ITBH 21 

Metropolitan Puerto Rican Teacher # Southwestern Mexican American Teacher, and 
American Indian Teacher: Children in some parts of the country have never 
seen the places that are described « 

Asian American Teacher^ Metropolitan Black Administrator « Southern Black 

Adirdnistrator, and Southern Black t\dministrator: In some regions, children 
will have this information on which to draw versus others who must entirely 
deduce the information from the passage. 

ITEM 34 

Metropolitan Black Administrator; The entire article is biased because of 
the vocabulary used which is not part of the ghetto experience. Such 
words as "equatorial," "captivity," "nomadic," "dainty morsels," "vegetarian, 
and "encroaching" would be unfamiliar. 

Southern Black Administrator: Many of the things described will be unfamiliar 
to low socioeconomic groups. 

Southwestern Mexican American Test Developer and Metropolitan Black Teacher: 

The choices assume that the child is familiar with the behavior of antelopes 
which may not be true of lower socioeconomic children. 
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APPENDIX £ 

SCOB£~T0-P£RCENTXLE-HANK CONVERSION TABLES AND SCHOOL-MEANS' 
TO-PERCENTILE-RANK CONVERSION TABLES FOR GRADES 3, 4, AND 5 
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Instruction for using the ES?VA-*Eligible Minority Isolated Norm Tables 



The tables labeled *^Baw Score to Percentile Hank^* and **School Means to Percentile 
Rank^* for grades 3^ 4, and 5 provide norm information for the individual student 
level and the school level respectively^ The tables are based on the cumulative 
distribution of the raw achievement scores on the various subtests at the 
individual student level and the cumulative distribution of the school means 
at the school level. For both types of tables, percentile ranks are indicated 
in relation to the raw scores on the original achievement subtests and the debiased 
achievement subtests « A percentile rank gives the percentage of students in a 
given reference group that obtained scores equal to or less than a certain 
score* Percentile ranks represent the relative quality or rank order of each 
score in comparison vith all other scores earned by that reference group ^ and are 
coiQparable from test to test for the same reference group. 

If you desire norms (percentile ranks) for an individual student in relation to 
the other students in the sample ^ use the *^Raw Score to Percentile Rank*^ tables 
for the appropriate grade level • Find tlie raw score tmder the column for the 
specific subtest or total of the origintil or debiased version. Bead across 
the page to the left or right margin on the same line to find the corresponding 
percentile rank. 

The school mean norms are most useful for evaluative purposes # and are presented 
in the "School Means to Percentile Rank" tables for grades 3, 4, and 5. These 
norms should be used for comparing the average performance of students at one 
school relative to the performance of other £SAA**eligible minorityisolated 
schools across the nation. To use these tables, find the raw score school mean 
under the column for the specific subtest or total of the original or debiased 
version. Read across the page to the left or right margin on the same line to 
find the corresponding percentile rank. 
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^lE lVtORjA^NDlJlVt ^^^'^A^TMENT OF HEALTH, EDUCATION* AND WEtPARE 



OFFICE OF EDUCATION 



TO : RuaiUih of: Achievement ^est Ht^standofdiaation DATE: January 1/, 1975 



fHOM : Mlcuael J. U'argo, ESAA hvaluation Froftram Officer 

SUBJECT: (1) Ethical principles on release ot test items to general public. 
(2) U;srpothetical examples of biased test items* 



Reviewers of Achi e ve m ent Test Hes jt andardlzatlon have suggested that 

the Office of Education rtslease to the public some examples of 

achievement test items determined to be biased against minority 

students. Such a practice would be inconsistant with the Ethical 

Standards of PsycholoRists (i^erican Psychological Association) and the 

Standards for Development and Use of Educational and Psychological 

Tests (National Council on Measurement in Education, American Educational 

Research Association, and American Psychological Association) which 

forbid the release of standardized test items to the general public on 

the grounds that such release would invalidate test items and possibly 

the entire test. Qualified test users, to whom the above restriction does 

not apply, can identify items that were determined to be baised by matching 

the item numbers in Appendix D with the items on a copy of the published ttst. 

Since this report does not include actual examples of test item bias, 
the following will provide two hypothetical examples for illustrative 
purposes. Examples are structured and formatted as were the actual items 
determined to be biased by the reported study. 



Example Biased Items 

The following examples are designed to Illustrate how an item in a 
standardized test can be biased against particular subgroups of students. 
The examples were designed for students in grades 3 y A , & 5 and could be 
part of a Reading Vocabulary subtest of any standardized achievement test. 



i;5 




instruccioiis: For each of the itor:a bclov c'wosc the xcord with the best 
meanitts for the «ord underlined. Circle the word vitU the best meaning. 

1. Comfortable tdc fi 
. bath 

. animl 
. ♦ study 
, sofa 

2. fast boulevard 
. traffic 

* 

, street 
. trip 
. streaw 
Explattatloti of possib le bias 

Item 1: Tne higher the socioeconomic status of a family the creator the 
likelihood that the hon.e of the fair.ily vill contain a spare roojn that 
might be -referred to as a "den." Further, if such a room exists, the use 
of it as a "study" would tend to Increase as the socioeconomic status ot 
the family increases. Therefore, one would expect poor minority group 
children, living in homes with large families, to have had less exposure 
to the word "den" than their more advantaged peers and to have a less 
clear understanding of its possible use as a "study.' In short. Item 1 
•might be biased against such children in its ability to measure reading 
vocabulary. 

Item 2: Children from higher socioeconomic status ftimili-^s generally 
are exposed to more varied reading materials and travel experiences. 
The probability of such children being exposed to the word boulevard 
and experiencing the sight of a "boulevard" is ?.reater for them than 
for their more disadvantaged peers. Imxr city and rural minority 
group members are therefore loss likely to be faniliar with Uw. vord 
"boulevard" and the item may be biased against them in measuring their 
reading achievement. 



